Andrea Fassina – Bitmovin https://bitmovin.com Bitmovin provides adaptive streaming infrastructure for video publishers and integrators. Fastest cloud encoding and HTML5 Player. Play Video Anywhere. Mon, 18 Sep 2023 14:27:58 +0000 en-GB hourly 1 https://bitmovin.com/wp-content/uploads/2023/11/bitmovin_favicon.svg Andrea Fassina – Bitmovin https://bitmovin.com 32 32 Encoding Definition and Adaptive Bitrate: Video Compression Basics https://bitmovin.com/blog/encoding-definition-bitrates/ Thu, 20 Jan 2022 10:58:01 +0000 https://bitmovin.com/?p=78373 For the latest information on everything video encoding; check out our ultimate guide Video Encoding: The Big Streaming Technology Guide [2023] Welcome to our encoding definition and adaptive bitrate guide. This article is for anyone seeking a way into the world of Video Technology and Development, or for those of you looking for a quick...

The post Encoding Definition and Adaptive Bitrate: Video Compression Basics appeared first on Bitmovin.

]]>
For the latest information on everything video encoding; check out our ultimate guide Video Encoding: The Big Streaming Technology Guide [2023]

Welcome to our encoding definition and adaptive bitrate guide.

This article is for anyone seeking a way into the world of Video Technology and Development, or for those of you looking for a quick refresher on the key terms that define the industry.

You’ll learn exactly what encoding is and some of the most important factors within the encoding process.

Let’s get started. 

What is a Codec?

Codec types inside a video clip

A codec is a device or a program intended to compress raw media files (ex: video, audio, and/or subtitles). There are multiple variations of codecs for each file format; common examples of video codecs include: H.264, HEVC, VP9 and AV1.

For audio there are: AAC, MP3 or Opus. A few essential codecs are visible in the image below:

The purpose of a codec is to efficiently and quickly transfer, store or play back a file on a device. The process of compressing these raw or uncompressed files into a codec is known as encoding

What is Encoding?

A encoding, decoding and transcoding process

Encoding is the process of converting a raw video file (codec) into a compatible, compressed and efficient digital format. The new compressed file is capable of distribution across the web and playback in mobile or TV players. 

For example: A multimedia conglomerate could be tasked with distributing OTT content like Game of Thrones to a commuter’s mobile device in a region that may have slower internet/data speeds. This transmission would therefore require a lot of back-end communications and encodes; where distributing an individual episode at the highest quality (recording quality of cameras), would be highly inefficient & expensive.

A solution is to run these ultra high quality videos through a video encoder during the processing phase. This  packages the requested video files in a way that will lose minimum quality during the transmission, otherwise known as “semi-lossless compression”. 

From a technical perspective, an example of encoding would be the delivery of a single uncompressed RGB 16-bit frame, with a size of 12.4MB to a Monochrome 8-bit frame with a size of 3.11MB.

If you are reading this from Europe – the standard is 25 frames per second (FPS), whereas videos in the US run at 29.97 FPS. So, for 60 seconds of video at 24 frames per second an encoding software would bring the total size of the video file down from 17.9GB to 2.9GB.

However, 3GB for 60 seconds of video may still be too much to stream from your phone while you are attempting to watch something on the bus to work, so further optimization is needed. 

What is Transcoding?

A more complex variation of encoding is transcoding, the process of converting one codec to another (or the same) codec. Both decoding & encoding are necessary steps to achieving a successful transcode.

Transcoding is a standard practice for online video – the process of compressing an already compressed file. Therefore enabling consumers to access higher quality experiences at significantly lower costs of distribution.

In other words, more steps are necessary to deliver that high quality video to multiple devices. Additionally, an encoder can implement frame size reductions to maximize the perceived quality of your average consumer.

So, how does one further compress a data file?

Using a command line interface, encoders like Bitmovin, who provide both API and GUI encoder products, analyze and process all inputted video files.

Depending on which resolution is needed in the output file, a different video codec is used. The best video codec is one that encodes video for the specific resolution and format that optimizes for best perceived quality in the smallest possible size.

One of the standard metrics of measurement for video quality is the peak signal-to-noise ratio (PSNR): the comparison of “good data” against how much noise there is within the file; the higher the number, the better.

PSNR is measured using decibels (like in sound) and 80db is typically a good magnitude for quality.

However, not all video files are equal, sports and dynamic videos are significantly larger in size and complexity than your average cartoon. As a result, encoders like Bitmovin utilize a further customizable solution, per-title encoding, which tunes each compression to achieve maximum quality, minimum size, and at the lowest cost.

What is Bitrate?

Having learned the definitions of Encoding and Transcoding and how they affect content quality, the next step is defining the basis of measurement for speed and cost in media transmission.

The industry standard, Bitrates, are calculated (and charged) based on the number of bits per second that can be transmitted along a digital network. The higher amount of Bits that can be processed per second are indicative of a faster and higher quality transfer – however this usually comes at a higher cost.

All available bitrates and resolutions that video (and audio) segments  are encoded in, as well as their server locations are referenced in a text file defined by either the DASH or HLS protocols. These manifest files (.mpd for DASH, .m4u8 for HLS) are fed into a player; which protocol is used depends entirely on the device capabilities of the consumer.

Bitrate gives a value of how much of the video file (in bits) can we process over time while the video is playing back. However, it doesn’t always make sense to transfer the highest quality to every user and every device.

There are some who will consume the content on a cellular network while in motion (like for our friendly aforementioned commuter) and others who will consume that same content on a 4K TV with a fibre optic connection.

In addition, that same user may start viewing the content on the 4K TV and continue en route to their office on a mobile phone with a 3G network. 

Encoding & Bitrates in Action

During an encode, video and audio components are split (a reference is kept for the decode) in 1 second segments; the segment length can be arbitrary, but the maximum is 10 seconds.

Each of these video segments can be saved in a different quality (and frame size) by a video encoder.

The quality and size of the output video is set by selecting a bitrate by a distributing service. In a perfect world, the service provider will select the perfect bitrate for each video to be transferred to the end user that will avoid stuttering or buffering.

You can find a chart of the standard bitrate ladder below as compared to the ladder for Bitmovin’s Per-Title Encoding solution:
encoding definition: A bitrate ladder table

Latest in Encoding Tech: VVC and VP9 codecs

The latest state of the art encoding tech is Versatile Video Coding (VVC); an improvement over Next Gen Open Video or VP9 codec (2013). VVC improves the prediction of the parts (blocks) in a frame by looking at other neighboring blocks and comparing them to what they behaved like before the encode/transcode.

Factors that play into how the VVC function include: the motion of the block with respect to all others (or motion compensation), changes of the block from how it looked in the past, and a prediction of how it will look like in the future (temporal motion prediction).

Future of Video: Common Media Application Format (CMAF) 

The future of streaming is driven by CMAF, an encoding method that splits a video file into small chunks.

These chunked files are instantly playable by a consumer, unlike segmented files which need to be fully downloaded before playing.

Think of a flaky connection: high lags with long buffer times, just to download 10 seconds of video. CMAF aims to solve flaky videos with a Common Encryption format to ease the deployment of Digital Rights Management technologies.

We hope you found this encoding definition and adaptive bitrate guide useful.  If you did, please don’t be afraid to share it on your social networks!

More video technology guides and articles:

The post Encoding Definition and Adaptive Bitrate: Video Compression Basics appeared first on Bitmovin.

]]>
Developer Network Series: Everything you need to know about Video Compression https://bitmovin.com/blog/video-compression-fundamentals/ Thu, 30 Apr 2020 12:45:05 +0000 https://bitmovin.com/?p=113182 In the previous series of posts, we set a baseline of compression knowledge by defining how lossy compression algorithms work for images and breaking down the JPEG image standard. Now that we’ve covered our image compression basics, it’s time to move up to the next level with video -as you might have guessed – a...

The post Developer Network Series: Everything you need to know about Video Compression appeared first on Bitmovin.

]]>
In the previous series of posts, we set a baseline of compression knowledge by defining how lossy compression algorithms work for images and breaking down the JPEG image standard. Now that we’ve covered our image compression basics, it’s time to move up to the next level with video -as you might have guessed – a time-ordered sequence of frames. Although compression is incredibly important for image distribution, it’s even more so for video. As seen with compressing images using the JPEG standard,  the brute force approach to video compression is MJPEG or Motion-JPEG, where the Huffman encoding is applied and frames are the size of JPEG images from a digital camera. Motion JPEG 2000 is heavily deployed in the industry and used across multiple commercial digital still cameras
Given that the image compression techniques necessary to efficiently transmit an image are complex, more video compression techniques are available and applied for video. Video compression, otherwise known as video encoding, is the process of converting a raw format into a compressed one (ex: RGB16 or MJPEG to MP4). Encoding is the first step in the compression chain for internet delivery of video. Additional transcodes can be applied to a video, hence taking the output of the encoder and compressing further, creating a better-compressed file for transmission. So, what happens in video compression? 

How to Compress Video

The basic idea behind video encoding is to exploit high correlations in successive frames. To do so, predictive coding, otherwise known as Motion Estimation (ME), is applied to the preceding and (occasionally) succeeding frames to remove spatial redundancy by subtracting successive frame differences; Specifically, the first frame is like a full JPEG image, the second frame is the difference over time from the first frame. Therefore coding the residual error or the difference. If the difference is small from the previous image, then a high compression ratio can be achieved. 
The second video compression algorithm (after Huffman encoding) is Motion Compensation, where the vectors determined from the ME step are used to describe the picture in terms of the transformation of a reference frame to the frame currently being analyzed (illustrated below). Each image is divided in NxN blocks (where the default for N = 16) and run through an algorithm called Motion Estimation. Break up of a video frame into macroblocks:
Video Compression-Macroblock split-illustrated
Motion Estimation determines the motion vectors which describe the transformations from one picture to the next. One part of a picture is subtracted from the same part (block) in the previous image. This scan and subtraction process is illustrated below:
Video Compression-Motion Estimation Scans-illustrated

Frame-Types in Video Compression

There are three main types of frames, which are grouped together in Groups of Pictures (GOP):

I-frames

Otherwise known as Independent frames, I-frames don’t need additional frames to be decoded – much like JPEG images. These frames are also the largest in size of the three and appear the least frequently in videos compared to the other frame types. The implementation schematics of I-frames do not need any motion compensation and the encoding process is the same as for the JPEG standard. The graph below illustrates this process: 
video compression-iframe encoding-illustrated
The contents of the target frame are divided up into different macroblocks and a Forward Discrete Cosine Transform (FDCT) is applied. The FDCT process provides a smoothened out frequency distribution of the macroblocks in terms of pixel variations in a block. Once the pixels in the block have been converted to frequencies, the quantization step is applied to the macroblocks. In this step, the analog signal (frequency) is converted into a vector with different amplitude and size, depending on the spatial redundancy of the block. The less redundancy (more variation) in the block, the greater the size and amplitude described by the quantization vector; thus less compression is applied. Next, entropy encoding converts the amplitude in bits and the size identified in the quantization step and encodes this information in a bitstream – usually starting from the Most Significant Bit. This is called progressive mode and the image can then be decoded with successive approximation, through iterative scans. This is why when you are loading a JPEG image on the internet, it initially looks blocky and more details are added as the image loads. The average size on disk for a 320×240 I-Frame is 14.8KB.

P-frames

Otherwise known as Predictive frames, P-frames are frames that are relative to the preceding P-frame OR preceding I-frame. When Motion Compensation is applied to P frames, the best macroblock (the one with less variance) in the previous frame is used to encode the target macroblock. The graph below illustrates this process:
video compression-pframe prediction-illustrated
The implementation schematics of P-frames need Motion Compensation to find the difference between the previous frames, as illustrated below:
video compression-pframe encoding-illustrated
In addition to the steps carried out for I-frames, P-frames need few additional steps: I-frames introduce a feedback mechanism where the content of the target frame and all its macroblocks are encoded with reference to previous frames. The reference frame content is compared from the point of view of motion estimation with the current frame. This is done via Direct Quantization(DQ) and the Inverse Discrete Cosine Transform (IDCT). The differences are then encoded in the bitstream. The average size on disk for a (320×240) P-Frame is 6KB.

B-frame

Otherwise known as Bidirectional frames. B-Frames are relative to both preceding and succeeding I or P frames:
video compression-bframes-illustrated
More efficient than looking at full successive pictures, B-frames divide a frame into smaller parts or blocks, like in JPEG coding.
When Motion Compensation is applied to B-frames, an encoder seeks the best reference macroblock for compression based on previous and upcoming frames. The implementation schematics of B-frames need motion compensation to find the best-suited macroblock for the target macroblock by selecting which has the least variation from the target macroblock. 
video compression-bframes predictions-illustrated
All predictions are calculated using the same method, albeit more complex, as P-Frames. The implementation schematics of B-frames look like:
- Bitmovin
The B-frames implementation process is similar to the ones for I and P frames, but with the additional step of comparing successive frames. The additional step implies that B-frames will take longer to encode because you need to download and decode the extra data. Although a delay is introduced at the computation step, the result is more efficient encodes and a faster distribution process. The average size on disk for a 320×240 B-Frame is 765B.
Efficient video compressions use a mix of I, P and B frames – this mix approached is illustrated within a bitstream here:
video compression-efficient encoding bitstream-illustrated
This process is even more efficient when searching for just the right parts of the image to subtract from the previous frame.

Step-by-Step Video Compression Process Summarized

To recap:

  1. A video is a sequence of frames
  2. Each frame is split up into NxN macroblocks
  3. For each NxN macroblock DCT is applied (measuring brightness changes in a block)
  4. Quantization is applied on the resulting frequencies and then discretized
  5. Frames are encoded based on the differences identified during Motion Estimation
  6. There are 3 types of frames in video compression and video encoding:
    1. I-Frame: Independent frames, like a full JPEG image, are coded fully and individually – without referencing any other frames. About 15Kb for an I-frame of 320×240.
    2. P-Frame: Preceding frames are frames that only measure the differences from previous frames. About 6Kb for an P-frame of 320×240.
    3. B-Frame: Bidirectional frames are encoded based on the differences between the preceding and successive frames. About 750B for a B-Frame of 320×240.

- Bitmovin

The post Developer Network Series: Everything you need to know about Video Compression appeared first on Bitmovin.

]]>
Dev Network Series: Everything you need to know about image compression https://bitmovin.com/blog/image-compression-jpeg/ Tue, 31 Mar 2020 14:43:18 +0000 https://bitmovin.com/?p=108913 Everyone knows what a jpeg is! It’s an image of course.  Yes and No – JPEG is the acronym that sits at the end of your picture file on your favorite device. You must be thinking, “Hold on one second, if it’s not the actual image, and it’s an acronym, what is it and what...

The post Dev Network Series: Everything you need to know about image compression appeared first on Bitmovin.

]]>
Everyone knows what a jpeg is! It’s an image of course. 
Yes and No – JPEG is the acronym that sits at the end of your picture file on your favorite device. You must be thinking, “Hold on one second, if it’s not the actual image, and it’s an acronym, what is it and what does it stand for?” Keep reading to find out:

Do I look like I know what a JPEG is? 

jpeg-meme
You may know the meme, do you actually know? JPEG is one of the world’s most common lossy image compression standards and stands for Joint Photographic Experts Group. JPEG is defined as a working group within the International Organization for Standardization (ISO/IEC). It’s important to note that JPG and JPEG are the same, however, the former was used in Windows because file name extensions could only have three letters. 
The process of compressing an image in a lossy way is to convert it from the raw format to JPEG, whereby the image is split into blocks (ex: 8×8 pixels), separated, and organized using the Discrete Cosine Transform (learn more about DCT). The image is described as a function in the spatial domain in this step of the data compression process. The effectiveness of JPEG when using DCT as a transfer encoding (effectively mapping input values to output values, using less data) is based on three major observations: saliency in an image, spatial redundancy, and visual acuity.

Deciding if it “Needs More JPEG”

To measure the effectiveness and quality of lossy compression, you need to understand these observations in detail: 

Image Saliency

The focal point (or salience) of an image changes slowly across an image. it is unusual for pixel intensity values to vary multiple times in a small area, for example within an 8×8 image block. The majority of data in a pixel block is repeated, this is known as “spatial redundancy.” The same applies to the rest of the blocks in an image. That’s why it’s possible to represent an image with less data than the original content without sacrificing quality.

Spatial Redundancy

Psychophysical experiments suggest that people are much less likely to notice the loss of very high spatial frequency components than the loss of lower frequency components. For example, high-frequency components can be white pixels in consecutive image blocks on a white background, whereas low-frequency components like facial features in close up scenes are much more noticeable. Thus, high spatial frequency components are exploited by applying the DCT on them to considerably decrease the spatial redundancy in an image (and therefore in a video). The outcome of this process is a lower expenditure on bits when transmitting an image of similar quality as the original.

Visual Acuity

Visual Acuity is the measurement of a viewer’s ability to accurately distinguish closely spaced lines. For example, the difference between gray variants (black & white scale) can be distinguished with much greater accuracy and ease than color variants. Other than RGB, color in an image can be represented with three channels: one for luminance and two for chroma samples, YCbCr. This concept is applied in JPEG in the context of Chroma subsampling; which utilizes visual acuity to use more information when encoding brightness (luminance), as the human eye has a higher sensitivity towards luminance than for color differences (chrominance). The most common specification for chroma subsampling is a 4:2:2 ratio – as defined below:

Y:Cb:Cr – Y is the width in pixels of the sampling reference, Cb is the number of chroma values, and Cr is the number of changes in the chroma values.

In a 4:2:2 ratio, luminance has twice the sample rate of the chrominance. When the ratio is 4:4:4 – there is no chroma subsampling. The combined assumptions that image information varies relatively slowly, people are less likely to notice the loss of highly frequent components, and that luminance takes visual priority in human sight are the theoretical frameworks behind implementing a JPEG encoder.

Encoding images – processes defined

A full JPEG encoding process is best visualized using the following block diagram:
JPEG_Encoding Process_Illustrated
The process begins with a source image that’s prepared for encoding using chroma subsampling.

Discrete Cosine Transform

After the chroma subsample ratios are set, the image is split into 8×8 blocks:
JPEG_Chroma Subsample_Blocks_illustrated
Using a block diagram, each block is processed independently of the others. Forward DC is then applied individually to each of these blocks, successive image block analysis (backward DCT analyzes previous blocks). An image must be split into blocks and individually transformed, otherwise, applying high compression ratios will result in a blocky JPEG image. An encoder will vector an image using a Zig-Zag scanning method (from the top left-hand corner to bottom right-hand corner). The graphic below illustrates this process:
JPEG_Block Diagram
Applying DCT will split the image from the upper left to the lower right corner into AC and DC coefficients. DCT coefficients are further divided into corresponding values of a quantization matrix. As illustrated below:
JPEG_Quantization Matrix

Quantization 

Applying quantization rounds up various integers, therefore introducing loss into the compression process. Higher frequency components have higher quantization table integer values. JPEG has predefined quantization tables for luminance and for chrominance components. Regardless of the content type (image, audio, or video), the goal of an encoder is to maximize the compression ratio and minimize the perceptual loss in transferred content. 
By separating frequency components, lower frequency components that map to 0 in the quantization table can be filtered out; therefore increasing the compression capacity. More 0s = more compression!
Next, the image is split into even smaller blocks, any variations across these blocks are measured over time. The next step after quantization is taking the Qantas coefficients (values extracted during quantization) and creating a vector with quantized values. 

Coding Coefficients

Successively, the encoder will perform a Run-length Coding (RLC) algorithm on the AC Coefficients. Run-length coding replaces frequency values with a pair of integers: the run-length value, which specifies the number of zeros in the run and the value of the next non-zero AC coefficient. Unlike AC, DC components use Differential coding after vectoring, where the first value of the DC coefficient is coded as is and the remaining values are coded as “different”. This process is illustrated below:
JPEG_Coding Coefficients_Illustrated
Once the RLC and differential codes are complete, both AC and DC coefficients undergo entropy coding. Entropy coding defines AC and DC values by size and amplitude; where size indicates the count of used and amplitude is the value of the bits. 
The final coding step that the AC and DC components will pass through is the Huffman coding algorithm. Only “size” is Huffman coded as smaller sizes occur more often in an image. Amplitude is not, as its value can change so widely that Huffman coding will have no appreciable benefit. This method is applied for every block the image has been split up into. 

Bitstreams

The resulting output needs to be put in a bitstream, also known as a binary sequence. There are many modes of bitstreams, JPEG uses a progressive mode. Which quickly delivers a low-quality version of the image, followed by higher quality passes. The JPEG bitstream is illustrated below:
JPEG_Bitstream_Progressive Mode_Illustrated
Using a progressive mode, the most significant bits are downloaded first, followed by less significant ones. The result is an image that increases in quality as more blocks are progressively decoded from the Most Significant Bit (MSB) to the Least Significant Bit (LSB). This means that your browser is effectively doing multiple decodes as more information added over time. An example of progressive JPEG decoding over time:
JPEG_Bitstream_Progressive Mode_Illustrated
Did you enjoy this post and do you want to learn more about compression algorithms?
This post is a part of our Video Developer Network, the home of free university-level courses about back-end video technology. Check the links below for relevant content:
[Landing Page] Video Developer Network
[Developer Network Lesson] Image Compression Standards
[Blog] Developer Network Series: Everything you need to know about Lossy Compression Algorithms

The post Dev Network Series: Everything you need to know about image compression appeared first on Bitmovin.

]]>
Lossy Compression Algorithms: Everything a Developer Needs to Know https://bitmovin.com/blog/lossy-compression-algorithms/ Tue, 10 Mar 2020 15:24:37 +0000 https://bitmovin.com/?p=103361 This is a comprehensive and detailed guide to Lossy Compression algorithms. Here you will find everything a developer needs to know about this cutting-edge topic, in a a guide created by some of the most advanced video technology experts and engineers. Take your time to read through the whole resource, or just skip to the...

The post Lossy Compression Algorithms: Everything a Developer Needs to Know appeared first on Bitmovin.

]]>
This is a comprehensive and detailed guide to Lossy Compression algorithms.

Here you will find everything a developer needs to know about this cutting-edge topic, in a a guide created by some of the most advanced video technology experts and engineers.

Take your time to read through the whole resource, or just skip to the chapter that interests you using the table of contents below.

What Are Compression Algorithms? 

When it comes to content distribution, especially in the form of video, the size of the content can make or break your business.

Even standard quality content files (video, audio, and text) end up taking up a lot of space, especially as applied to the transportation and/or distribution of the file.

To alleviate the potentially extremely high cost of storage and delivery everyone uses some form of compression algorithms to reduce file size.

The use of compression is of utmost importance to your success because it reduces the file size while maintaining the same user-perceived quality. At the time of this blog post, there are two variations of compression algorithms – lossy and lossless.

The focus of this post is lossy compression.

Introduction to Lossy Compression

Lossy compression means that compressed data is not exactly the same as it was originally, but a close approximation to it.

In most cases, the human eye wouldn’t even notice the difference between an original file and one compressed in a lossy way, but it yields a much higher compression ratio than the lossless compression, where an exact copy of the content is created. 

Lossy compression is one of the most important factors necessary in modern content distribution methods.

Without (lossy) compression the content we view every day wouldn’t be nearly as high quality as it actually is, and that’s just one of the pitfalls society might face without any kind of compression.

Other challenges viewers and distributors would face without (lossy) compression: slow load/buffer times, high delivery and storage costs, and limited distribution capabilities.

This blog acts as complementary material to our Video Developer Network – if you would like to learn about lossy compression algorithms in a classroom-style video format watch the video here.

What the Math?! Lossy Compression Ratios & Metrics in Digital Video

Lossy compression algorithms deliver compression ratios that are high and represent most multimedia compression algorithms in image, video, and audio content.

The goal of video and audio compression is to maximize the compression ratio and to minimize the distortion; a common trade-off in all compression technologies.

The standard formula for lossy compression algorithms is defined as “close-approximation”, measured by establishing various distortion metrics that specify how close the compressed content is to the original – the most common measures are defined below:

Perceptual Distortion

Lossy Compression-Perceptual Distortion Graph

Perceptual distortion is a famous metric that has been used historically for assessing video quality. Distortion theory provides the framework to study the trade-offs between the data rate and the Distortion itself. 

In the graph above: Y-axis is the data rate and X-axis the distortion level.  If you have a high data rate and a zero distortion, it is a lossless compression scheme.

As soon as cost/spend limitations are considered (in the form of bandwidth and/or storage), data reduction rates will increase and image distortion will appear. 

Mean Square Error

Lossy Compression-Mean Square Error

Another measure of distortion is mean square error, where is X the input data sequence, Y is output data sequence and N is the count of elements: 

Peak-Signal-To-Noise Ratio (PSNR)

Then there is the Peak-Signal-To-Noise ratio (PSNR) which is calculated by comparing the size of an error relative to the peak value of a signal.

The higher the PSNR, the better the video quality. Signal-to-noise ratios are typically expressed in decibel units (dB).

A good ratio will register values of around 80db.

Having explained the metrics used to evaluate the accuracy and quality of lossy compression, it’s time to discuss how the compression process works.

Lossy Compression: the “two” step process

Step 1: Quantization

The step that adds the most distortion is quantization.

Quantization is the process of mapping input from a large set (like an analog signal) to numerical output values in a smaller (usually finite) set.

There are 3 different forms of quantization: uniform, non-uniform, vector.  

  1. Uniform scalar quantizer – subdivides the domain of the input into output values at regular intervals, with the exceptions at the two outer extremes. 
  2. Non-uniform quantizer – output values are not at equally spaced intervals. The output of the reconstructed value that corresponds to each interval is taken during quantization, the midpoint of this interval and the length of each interval is then referred to as the step size which can be denoted by a symbol.
  3. Vector quantizer – high decoding complexity, output values can be distributed irregularly, not in a grid fashion – such as in the scalar quantizer case – because an output value represents a vector and not a scalar value.

Step 2: Transform coding

Transform coding is the second step in Lossy Compression.

Transform coding is the process of creating a quantized group of blocks (containing all pixels in a frame) of consecutive samples from a source input and converting it into vectors.

The goal of transform coding is to decompose or transform the input signal into something easier to handle.

There is a good chance that there will be substantial correlations among neighboring samples; to put it in other words, adjacent pixels are usually similar, therefore, a compressor will remove some samples to reduce file size.

The range of pixels that can be removed without degrading quality irreparably is calculated by considering the most salient ones in a block.

For example: If Y is the result of a linear transform T of the input vector X in such a way that the components of Y are much less correlated, then Y can be coded more efficiently than X.

If most information is accurately described by the first few components of a transformed vector Y, then the remaining components can be coarsely quantized, or even set to zero, with little signal distortion. 

As correlation decreases between blocks and subsequent samples, the efficiency of the data signal encode increases.

Spatial frequency is one of the most important factors of transform coding because it defines how an image (and the pixels within it) change throughout playback in relation to previous and future pixel blocks.

This graphs here depicts two variations:
Lossy Compression - Spatial Frequency Comparision charts
Spatial frequency indicates how many times pixel values change across an image block. It’s key to note – the human eye is less sensitive to higher spatial frequency components associated with an image than lower spatial frequency components.

If amplitude (learn more about frequency components metrics here) falls below a predefined threshold, it will not be detected by the average human eye.

A signal with high spatial frequency can be quantized more coarsely and therefore maintain quality at lower data rates than a signal with low spatial frequency, which will need more data to provide the user with high perceived quality.

One of the other factors is – Discrete Cosine Transform (DCT) implements the measure of motion by tracking how much image content changes corresponding to the numbers of cycles of the cosine in a block.

The DCT is part of the encoding algorithm and converts pixel values in an image block to frequency values, which can be transmitted with lower amounts of data.

DCT is lossless – apart from rounding errors – and spatial frequency components are called coefficients. The DCT splits the signal into a DC – direct current component and an AC, alternating current component.

With the IDCT or Inverse Discrete Cosine Transform, the original signal is reconstructed and can be decoded and played back. 

Step 2.5: Other Transformation Formats

Wavelet
An alternative method of lossy compression is wavelet transformation; which represents a sigma with good resolution in both time & frequency and utilizes a set of functions, called wavelets to transform to decompose an input signal.

Wavelet-coding works by repeatedly taking averages and differences by keeping results from every step of different image parts, this is (almost) a multi-resolution analysis.

Lossy Compression-Wavelet Transform sample

A wavelet transform creates progressively smaller summary images from the original, decreasing by a quarter of the size for each step. A great way to visualize wavelet coding is to consider a pyramid – stacking a full-size image, quarter-size image, sixteenth-size image, and so on, on top of each other.

The image has gone through a process of subsampling (through the wavelet transformation algorithm) decreasing the size but aiming at maintaining the quality in smaller iterations.

The image on the right in the top left quadrant has a compressed representation of the full-scale image on the left, which can be reconstructed from the smaller one by applying the wavelet coding transformation inversely.

Another example of lossy compressing a white and black image is:
lossy-compression-visualized-doggo
2D Haar Transform

2D Haar Transform is the representation of a signal with a discrete non-differentiable (step) function – consider a function that represents on/off states of a device.

Lossy-Compression-2DHaar Transform

In the context of image decomposition for a  simple image applying the 2D Haar Transform would look like:

The image on the left represents the pixel values of the image on the right, an 8 x 8 image.

Lossy-Compression- Wavelet vs 2D Haar comparison

Applying a 2D Haar Transform for the second level, yields a linear decrease of the image size:

The calculated differences and image decrease allow for the image to be compressed with less data while keeping an eye on quality.

More compression means lower quality and higher quality means lower compression.

Lossy-Compression-Color Example

In the case of color images, the same applies:

In short, the goal of all compression algorithms is to achieve the highest possible compression ratio. For any video distributor compression ratios come to down to cost and quality considerations.

Which trade-off will yield the highest ROI? High compression and high quality at higher costs? The opposite? Or somewhere in the middle?

That’s for you to decide!

Did you enjoy this post? Check out our Video Developer Network for the full university quality videos.  (including a lesson on Lossless Compression)

More video technology guides and articles:

Did you know?

Bitmovin has a range of VOD services that can help you deliver content to your customers effectively.

Its variety of features allows you to create content tailored to your specific audience, without the stress of setting everything up yourself. Built-in analytics also help you make technical decisions to deliver the optimal user experience.

Why not try Bitmovin for Free and see what it can do for you.

This is a comprehensive guide to lossy compression algorithms. 

In this detailed post, we’ll cover the following topics:

  • An Introduction to Lossy Compression
  • Lossy Compression Ratios & Metrics in Digital Video
  • Perceptual Distortion
  • Peak-Signal-To-Noise Ratio (PSNR)
  • Lossy Compression: the “two” step process
    • Step 1: Quantization
    • Step 2: Transform coding
  • Wavelet Transformation
  • 2D Haar Transform

So, if you are looking to learn about lossy compression algorithms then you are in the right place.

What Are Compression Algorithms? 

When it comes to content distribution, especially in the form of video, the size of the content can make or break your business.

Even standard quality content files (video, audio, and text) end up taking up a lot of space, especially as applied to the transportation and/or distribution of the file.

To alleviate the potentially extremely high cost of storage and delivery everyone uses some form of compression algorithms to reduce file size.

The use of compression is of utmost importance to your success because it reduces the file size while maintaining the same user-perceived quality. At the time of this blog post, there are two variations of compression algorithms – lossy and lossless.

The focus of this post is lossy compression.

Introduction to Lossy Compression

Lossy compression means that compressed data is not exactly the same as it was originally, but a close approximation to it.

In most cases, the human eye wouldn’t even notice the difference between an original file and one compressed in a lossy way, but it yields a much higher compression ratio than the lossless compression, where an exact copy of the content is created. 

Lossy compression is one of the most important factors necessary in modern content distribution methods.

Without (lossy) compression the content we view every day wouldn’t be nearly as high quality as it actually is, and that’s just one of the pitfalls society might face without any kind of compression.

Other challenges viewers and distributors would face without (lossy) compression: slow load/buffer times, high delivery and storage costs, and limited distribution capabilities.

This blog acts as complementary material to our Video Developer Network – if you would like to learn about lossy compression algorithms in a classroom-style video format watch the video here.

What the Math?! Lossy Compression Ratios & Metrics in Digital Video

Lossy compression algorithms deliver compression ratios that are high and represent most multimedia compression algorithms in image, video, and audio content.

The goal of video and audio compression is to maximize the compression ratio and to minimize the distortion; a common trade-off in all compression technologies.

The standard formula for lossy compression algorithms is defined as “close-approximation”, measured by establishing various distortion metrics that specify how close the compressed content is to the original – the most common measures are defined below:

Perceptual Distortion

Perceptual distortion is a famous metric that has been used historically for assessing video quality. Distortion theory provides the framework to study the trade-offs between the data rate and the Distortion itself. 
Lossy Compression-Perceptual Distortion Graph
In the graph above: Y-axis is the data rate and X-axis the distortion level.  If you have a high data rate and a zero distortion, it is a lossless compression scheme.

As soon as cost/spend limitations are considered (in the form of bandwidth and/or storage), data reduction rates will increase and image distortion will appear. 

Mean Square Error

Another measure of distortion is mean square error, where is X the input data sequence, Y is output data sequence and N is the count of elements: 
Lossy Compression-Mean Square Error

Peak-Signal-To-Noise Ratio (PSNR)

Then there is the Peak-Signal-To-Noise ratio (PSNR) which is calculated by comparing the size of an error relative to the peak value of a signal.

The higher the PSNR, the better the video quality. Signal-to-noise ratios are typically expressed in decibel units (dB).

A good ratio will register values of around 80db.

Having explained the metrics used to evaluate the accuracy and quality of lossy compression, it’s time to discuss how the compression process works.

Lossy Compression: the “two” step process

Step 1: Quantization

The step that adds the most distortion is quantization.

Quantization is the process of mapping input from a large set (like an analog signal) to numerical output values in a smaller (usually finite) set.

There are 3 different forms of quantization: uniform, non-uniform, vector.  

  1. Uniform scalar quantizer – subdivides the domain of the input into output values at regular intervals, with the exceptions at the two outer extremes. 
  2. Non-uniform quantizer – output values are not at equally spaced intervals. The output of the reconstructed value that corresponds to each interval is taken during quantization, the midpoint of this interval and the length of each interval is then referred to as the step size which can be denoted by a symbol.
  3. Vector quantizer – high decoding complexity, output values can be distributed irregularly, not in a grid fashion – such as in the scalar quantizer case – because an output value represents a vector and not a scalar value.

Step 2: Transform coding

Transform coding is the second step in Lossy Compression.

Transform coding is the process of creating a quantized group of blocks (containing all pixels in a frame) of consecutive samples from a source input and converting it into vectors.

The goal of transform coding is to decompose or transform the input signal into something easier to handle.

There is a good chance that there will be substantial correlations among neighboring samples; to put it in other words, adjacent pixels are usually similar, therefore, a compressor will remove some samples to reduce file size.

The range of pixels that can be removed without degrading quality irreparably is calculated by considering the most salient ones in a block.

For example: If Y is the result of a linear transform T of the input vector X in such a way that the components of Y are much less correlated, then Y can be coded more efficiently than X.

If most information is accurately described by the first few components of a transformed vector Y, then the remaining components can be coarsely quantized, or even set to zero, with little signal distortion. 

As correlation decreases between blocks and subsequent samples, the efficiency of the data signal encode increases.

Spatial frequency is one of the most important factors of transform coding because it defines how an image (and the pixels within it) change throughout playback in relation to previous and future pixel blocks.

This graphs here depicts two variations:
Lossy Compression - Spatial Frequency Comparision charts
Spatial frequency indicates how many times pixel values change across an image block. It’s key to note – the human eye is less sensitive to higher spatial frequency components associated with an image than lower spatial frequency components.

If amplitude (learn more about frequency components metrics here) falls below a predefined threshold, it will not be detected by the average human eye.

A signal with high spatial frequency can be quantized more coarsely and therefore maintain quality at lower data rates than a signal with low spatial frequency, which will need more data to provide the user with high perceived quality.

One of the other factors is – Discrete Cosine Transform (DCT) implements the measure of motion by tracking how much image content changes corresponding to the numbers of cycles of the cosine in a block.

The DCT is part of the encoding algorithm and converts pixel values in an image block to frequency values, which can be transmitted with lower amounts of data.

DCT is lossless – apart from rounding errors – and spatial frequency components are called coefficients. The DCT splits the signal into a DC – direct current component and an AC, alternating current component.

With the IDCT or Inverse Discrete Cosine Transform, the original signal is reconstructed and can be decoded and played back. 

Step 2.5: Other Transformation Formats

Wavelet
An alternative method of lossy compression is wavelet transformation; which represents a sigma with good resolution in both time & frequency and utilizes a set of functions, called wavelets to transform to decompose an input signal.

Wavelet-coding works by repeatedly taking averages and differences by keeping results from every step of different image parts, this is (almost) a multi-resolution analysis.

A wavelet transform creates progressively smaller summary images from the original, decreasing by a quarter of the size for each step. A great way to visualize wavelet coding is to consider a pyramid – stacking a full-size image, quarter-size image, sixteenth-size image, and so on, on top of each other.
Lossy Compression-Wavelet Transform sample
The image has gone through a process of subsampling (through the wavelet transformation algorithm) decreasing the size but aiming at maintaining the quality in smaller iterations.

The image on the right in the top left quadrant has a compressed representation of the full-scale image on the left, which can be reconstructed from the smaller one by applying the wavelet coding transformation inversely.

Another example of lossy compressing a white and black image is:
lossy-compression-visualized-doggo
2D Haar Transform

2D Haar Transform is the representation of a signal with a discrete non-differentiable (step) function – consider a function that represents on/off states of a device.

In the context of image decomposition for a  simple image applying the 2D Haar Transform would look like:
Lossy-Compression-2DHaar Transform
The image on the left represents the pixel values of the image on the right, an 8 x 8 image.

Applying a 2D Haar Transform for the second level, yields a linear decrease of the image size:
Lossy-Compression- Wavelet vs 2D Haar comparison
The calculated differences and image decrease allow for the image to be compressed with less data while keeping an eye on quality.

More compression means lower quality and higher quality means lower compression.

In the case of color images, the same applies:
Lossy-Compression-Color Example
In short, the goal of all compression algorithms is to achieve the highest possible compression ratio. For any video distributor compression ratios come to down to cost and quality considerations.

Which trade-off will yield the highest ROI? High compression and high quality at higher costs? The opposite? Or somewhere in the middle?

That’s for you to decide!

Did you enjoy this post? Check out our Video Developer Network for the full university quality videos.  (including a lesson on Lossless Compression)

More video technology guides and articles:

Did you know?

Bitmovin has a range of VOD services that can help you deliver content to your customers effectively.

Its variety of features allows you to create content tailored to your specific audience, without the stress of setting everything up yourself. Built-in analytics also help you make technical decisions to deliver the optimal user experience.

Why not try Bitmovin for Free and see what it can do for you.

 

The post Lossy Compression Algorithms: Everything a Developer Needs to Know appeared first on Bitmovin.

]]>
What is Video Transcoding? The Video Compression Basics https://bitmovin.com/blog/what-is-transcoding/ Thu, 20 Feb 2020 15:44:01 +0000 https://bitmovin.com/?p=98392 What is Transcoding? In the context of video – Transcoding refers to the process of compressing video files as much as possible at minimal quality loss to represent (and transfer) information by using less data. Essentially, video transcoding online is the conversion of a video file from one format to a better-compressed version to ensure...

The post What is Video Transcoding? The Video Compression Basics appeared first on Bitmovin.

]]>
What is Transcoding?

In the context of video – Transcoding refers to the process of compressing video files as much as possible at minimal quality loss to represent (and transfer) information by using less data.

Essentially, video transcoding online is the conversion of a video file from one format to a better-compressed version to ensure consumers can stream content without buffering and at the highest possible qualities.

It’s easy to get Transcoding mixed up with Encoding. A good encoding definition is ‘the process of converting a raw file (codec) in to a compatible, compressed and efficient digital format’.

So, with that said, how does transcoding work? And how might it affect your everyday life?

Keep reading to find out!

a video transcoding
What is Transcoding? Visualized

Your camera (or device), your content, and video transcoding in your home

Picture this scenario: You’ve recently returned home to your comfortable sofa from your latest adventure or hobby. For the past few hours, your eyes have been locked on your computer, uploading and editing the HD videos that you captured earlier. You’ve finally finished the editing process on your laptop from your GoPro, high-quality Kodak cam, and/or Apple, Android, Windows, or other devices (Bitmovin is OS-agnostic after all!) and you’re ready to download and share your latest creation. The raw video files that you recorded on your device are significantly larger than your standard cloud storage, file sharing service, or social media platform can (or will) handle (pro-tip: check the raw size of a video file saved on the actual device – chances are, it’s huge!). Most raw HD video files amount to 18 GB of storage for every 60 seconds; based on an average of 1920 x 1080 pixels of a standard RGB 3x16bit uncompressed TIFF file.
To most, clicking export and then “share to social” are the final step to collecting those sweet sweet “likes.” But that’s not enough for you, you want to understand how and why to move the video content from one device to another. The first critical step is to hit “export and save”, most editing software (like the GoPro Quik for GoPro, Capture NX-D for Nikon, Capture Cam Express for Sony devices) will ask you to specify an output folder or Network Access Storage (NAS) location (ex: a hard disk connected to the wifi), a video & audio codec configuration, and a container file format, like MP4. Congratulations! You’ve now completed the first step and unlocked all the elements required to complete the video transcode.  
After you’ve confirmed the export, your computer might heat up – given the size of your newly created content, it’s not unexpected, your computer will require a lot of temporary storage (in terms of gigabytes) per second of exported video. Depending on your computer’s specifications the video transcode might take more or less time based on your RAM – this works inversely, lower RAM = slower transcodes / higher RAM = faster transcodes. From the consumer perspective – transcoding is as easy as the few clicks it takes to save and export to a new device. In short, transcoding is the process of converting one compressed (almost) losslessly video to a better-compressed video format. This is how video compression works as it moves from your computer to another local device. 
To learn more about Lossless compression (as opposed to lossy compression) and how it works, check out (the completely free) Video Developer Network: Lesson 1.2 What are Lossless Compression Algorithms
—-Keep reading to learn how transcoding works on Smart and Over-the-top (OTT) media devices—-

From your device to your computer, now transcoded wirelessly to your SmartTV

So, you’ve exported the video file from your editor and you know you have some friends or family coming over, and you want to show off your final product on your brand new high definition SmartTV. The short answer to this conundrum is to connect your device to the TV with an HDMI cable, but let’s be realistic, do you want to leave your comfortable couch for such an archaic action? Definitely not, we are in 2020 after all! You plan on streaming your video using some of the smart features on your device, but how does that work?
The editing software (like Quick for GoPro) has already created a semi-compressed file on your computer, but chances are that regardless of the server that your software used (GPU-based with plex or an outright transcoding solution), the file needs to be compressed and optimized further for streaming capabilities. This process of compressing multiple times across transitions is imperative to lowering buffering times and improving the user experiences as fewer resources (from the back-end) are required for your TV to decode and stream the content.
If your laptop/NAS has to load and send a very large file across your local network it might not yield stable playback on the client/TV device, because it would have to transmit a lot of data (4K quality for 1 minute is ~400 MB). So the transcoding server can employ a conversion process to HLS or DASH (like DASH for Plex client-server communication or HLS & Smoothstreaming for Google Chromecast) to achieve better quality and more stable table video streaming to your SmartTV (Try it yourself with a hosted Bitmovin player). The software on the TV must support what it receives from the transcoding server (like a Chromecast extension in your browser and the stick in the TV).
Transcoding-OTT-Media
Using these extensions or Plex communications, a client device will support a video player that can playback the data received over Wifi. According to our 2020 Video Developer Report (illustrated below) the best video codec by device coverage is the H.264 codec, so that’s always a safe bet for which standard will operate across the most possible devices.

Video Codec Usage_Bitmovin Video Developer Report 2020_Bar Graph

The ultimate transcode – sharing your content with the world!

To stream a video in a living room environment, you’ll need – on top of what we described before –  a transcoding server, which encodes the transcoded video and sends it to a player that has the playback requirements of the device you want to consume the content on. For example, the player must be able to handle different types of streams (like ts or fmp4 for HLS), packaged in different protocols (like DASH or HLS). We are talking about a device that is capable through an app, of playing back the encoded video. When transcoding for streaming it is important to create a video file that meets device client requirements for playback; unfortunately, most client devices do not support every codec or container format as they aren’t supported natively  – ex: any 2011 SmartTV. Converting and compressing (video) files at the local level is a small sample of what transcoding for the highest perceived quality. Achieving stable playback on your Local Area Network (LAN) is much easier, conceptually at least, than it is to transcode a video online – or even live! Maintaining quality and limiting size is even more important for web-based environments as giving way in either category will affect your viewer’s experience; be it lower quality content or long buffer times. Unlike the (semi)straightforward process of tra0nscoding your  content locally, publishing your video online takes a lot more steps on the backend:
Laptop Transcode => Living Room (OTT) Transcode => Internet Transcode (+ Content Delivery Network) => Online!
In short, once you’re device completed the steps to transcode to your Smart TV, it has two (and a half) additional steps:

  1. Compressing and converting for internet transport – Uploading to a Content Delivery Network (CDN) – servers at “edge” locations (server locations at the EDGE of different networks)  near users with transcoded copies of your video that will further distribute your video
  1. Delivered and played back on some stranger’s device (Yay internet!)

The edge locations allow faster transference and higher quality transference of data so that viewers of your video can see the great content that you are making without buffering and possibly as close to real event as possible (especially if you’re live-streaming).

Author’s Practical Example: End-to-End (E2E) video transcoding for Bitmovin

Now that you have a clear picture in mind of a daily transcoding process in your own home, I’ll shift to an ongoing real-life experience of transcoding video content for our recently published Bitmovin Video Developer Network, the go-to resource for people wanting to learn (more) about video development. The network contains multiple videos with university grade content.
I’m personally tasked with publishing all produced videos online. Our first video, Lesson 1.0 introduction & motivation (5 min runtime), was transcoded using two different methodologies  (to test for efficiency):

  1. Standard H.264 profile encode at 1980×1080 pixels, resulting in  5GB(s) of video files 
  2. Per-title encode, resulting in 200MB of video files (smaller by a factor 20x)

To ensure that our educational resource is truly universally streamable to users around the world it was important to transcode the videos at qualities and bitrates that all users can consume without buffering, therefore we used Per-Title methodology for all videos moving forward.

Bitmovin’s Video Developer Network

Bitmovin-DevNetwork-Main-Image
Introduced to the general public as of December 2019, the Bitmovin Developer Network was established with the sole purpose of introducing developers (and other interested parties) to the world of video development. Between our university-grade video lessons (all courses are completely free), global vid tech meet-ups, and learning labs, the developer network is the true home to grow the industry knowledge and capabilities of modern and future video technologies.
Are you interested in joining the Developer Network? All you need to do is sign-up with your email address at the bottom of the Developer Network Homepage. We’ll keep you up-to-date with the latest courses, meetups, and events! 

More video technology guides and articles:

Did you know?

Bitmovin has a range of VOD services that can help you deliver content to your customers effectively.
Its variety of features allows you to create content tailored to your specific audience, without the stress of setting everything up yourself. Built-in analytics also help you make technical decisions to deliver the optimal user experience.
Why not try Bitmovin for Free and see what it can do for you.

The post What is Video Transcoding? The Video Compression Basics appeared first on Bitmovin.

]]>