Andrea Fassina – Bitmovin

Encoding Definition and Adaptive Bitrate: Video Compression Basics

Andrea Fassina — Thu, 20 Jan 2022 10:58:01 +0000

For the latest information on everything video encoding; check out our ultimate guide Video Encoding: The Big Streaming Technology Guide [2023]

Welcome to our encoding definition and adaptive bitrate guide.

This article is for anyone seeking a way into the world of Video Technology and Development, or for those of you looking for a quick refresher on the key terms that define the industry.

You’ll learn exactly what encoding is and some of the most important factors within the encoding process.

Let’s get started.

What is a Codec?

A codec is a device or a program intended to compress raw media files (ex: video, audio, and/or subtitles). There are multiple variations of codecs for each file format; common examples of video codecs include: H.264, HEVC, VP9 and AV1.

For audio there are: AAC, MP3 or Opus. A few essential codecs are visible in the image below:

The purpose of a codec is to efficiently and quickly transfer, store or play back a file on a device. The process of compressing these raw or uncompressed files into a codec is known as encoding.

What is Encoding?

Encoding is the process of converting a raw video file (codec) into a compatible, compressed and efficient digital format. The new compressed file is capable of distribution across the web and playback in mobile or TV players.

For example: A multimedia conglomerate could be tasked with distributing OTT content like Game of Thrones to a commuter’s mobile device in a region that may have slower internet/data speeds. This transmission would therefore require a lot of back-end communications and encodes; where distributing an individual episode at the highest quality (recording quality of cameras), would be highly inefficient & expensive.

A solution is to run these ultra high quality videos through a video encoder during the processing phase. This packages the requested video files in a way that will lose minimum quality during the transmission, otherwise known as “semi-lossless compression”.

From a technical perspective, an example of encoding would be the delivery of a single uncompressed RGB 16-bit frame, with a size of 12.4MB to a Monochrome 8-bit frame with a size of 3.11MB.

If you are reading this from Europe – the standard is 25 frames per second (FPS), whereas videos in the US run at 29.97 FPS. So, for 60 seconds of video at 24 frames per second an encoding software would bring the total size of the video file down from 17.9GB to 2.9GB.

However, 3GB for 60 seconds of video may still be too much to stream from your phone while you are attempting to watch something on the bus to work, so further optimization is needed.

What is Transcoding?

A more complex variation of encoding is transcoding, the process of converting one codec to another (or the same) codec. Both decoding & encoding are necessary steps to achieving a successful transcode.

Transcoding is a standard practice for online video – the process of compressing an already compressed file. Therefore enabling consumers to access higher quality experiences at significantly lower costs of distribution.

In other words, more steps are necessary to deliver that high quality video to multiple devices. Additionally, an encoder can implement frame size reductions to maximize the perceived quality of your average consumer.

So, how does one further compress a data file?

Using a command line interface, encoders like Bitmovin, who provide both API and GUI encoder products, analyze and process all inputted video files.

Depending on which resolution is needed in the output file, a different video codec is used. The best video codec is one that encodes video for the specific resolution and format that optimizes for best perceived quality in the smallest possible size.

One of the standard metrics of measurement for video quality is the peak signal-to-noise ratio (PSNR): the comparison of “good data” against how much noise there is within the file; the higher the number, the better.

PSNR is measured using decibels (like in sound) and 80db is typically a good magnitude for quality.

However, not all video files are equal, sports and dynamic videos are significantly larger in size and complexity than your average cartoon. As a result, encoders like Bitmovin utilize a further customizable solution, per-title encoding, which tunes each compression to achieve maximum quality, minimum size, and at the lowest cost.

What is Bitrate?

Having learned the definitions of Encoding and Transcoding and how they affect content quality, the next step is defining the basis of measurement for speed and cost in media transmission.

The industry standard, Bitrates, are calculated (and charged) based on the number of bits per second that can be transmitted along a digital network. The higher amount of Bits that can be processed per second are indicative of a faster and higher quality transfer – however this usually comes at a higher cost.

All available bitrates and resolutions that video (and audio) segments are encoded in, as well as their server locations are referenced in a text file defined by either the DASH or HLS protocols. These manifest files (.mpd for DASH, .m4u8 for HLS) are fed into a player; which protocol is used depends entirely on the device capabilities of the consumer.

Bitrate gives a value of how much of the video file (in bits) can we process over time while the video is playing back. However, it doesn’t always make sense to transfer the highest quality to every user and every device.

There are some who will consume the content on a cellular network while in motion (like for our friendly aforementioned commuter) and others who will consume that same content on a 4K TV with a fibre optic connection.

In addition, that same user may start viewing the content on the 4K TV and continue en route to their office on a mobile phone with a 3G network.

Encoding & Bitrates in Action

During an encode, video and audio components are split (a reference is kept for the decode) in 1 second segments; the segment length can be arbitrary, but the maximum is 10 seconds.

Each of these video segments can be saved in a different quality (and frame size) by a video encoder.

The quality and size of the output video is set by selecting a bitrate by a distributing service. In a perfect world, the service provider will select the perfect bitrate for each video to be transferred to the end user that will avoid stuttering or buffering.

You can find a chart of the standard bitrate ladder below as compared to the ladder for Bitmovin’s Per-Title Encoding solution:

Latest in Encoding Tech: VVC and VP9 codecs

The latest state of the art encoding tech is Versatile Video Coding (VVC); an improvement over Next Gen Open Video or VP9 codec (2013). VVC improves the prediction of the parts (blocks) in a frame by looking at other neighboring blocks and comparing them to what they behaved like before the encode/transcode.

Factors that play into how the VVC function include: the motion of the block with respect to all others (or motion compensation), changes of the block from how it looked in the past, and a prediction of how it will look like in the future (temporal motion prediction).

Future of Video: Common Media Application Format (CMAF)

The future of streaming is driven by CMAF, an encoding method that splits a video file into small chunks.

These chunked files are instantly playable by a consumer, unlike segmented files which need to be fully downloaded before playing.

Think of a flaky connection: high lags with long buffer times, just to download 10 seconds of video. CMAF aims to solve flaky videos with a Common Encryption format to ease the deployment of Digital Rights Management technologies.

We hope you found this encoding definition and adaptive bitrate guide useful. If you did, please don’t be afraid to share it on your social networks!

Developer Network Series: Everything you need to know about Video Compression

Andrea Fassina — Thu, 30 Apr 2020 12:45:05 +0000

In the previous series of posts, we set a baseline of compression knowledge by defining how lossy compression algorithms work for images and breaking down the JPEG image standard. Now that we’ve covered our image compression basics, it’s time to move up to the next level with video -as you might have guessed – a time-ordered sequence of frames. Although compression is incredibly important for image distribution, it’s even more so for video. As seen with compressing images using the JPEG standard, the brute force approach to video compression is MJPEG or Motion-JPEG, where the Huffman encoding is applied and frames are the size of JPEG images from a digital camera. Motion JPEG 2000 is heavily deployed in the industry and used across multiple commercial digital still cameras.
Given that the image compression techniques necessary to efficiently transmit an image are complex, more video compression techniques are available and applied for video. Video compression, otherwise known as video encoding, is the process of converting a raw format into a compressed one (ex: RGB16 or MJPEG to MP4). Encoding is the first step in the compression chain for internet delivery of video. Additional transcodes can be applied to a video, hence taking the output of the encoder and compressing further, creating a better-compressed file for transmission. So, what happens in video compression?

How to Compress Video

The basic idea behind video encoding is to exploit high correlations in successive frames. To do so, predictive coding, otherwise known as Motion Estimation (ME), is applied to the preceding and (occasionally) succeeding frames to remove spatial redundancy by subtracting successive frame differences; Specifically, the first frame is like a full JPEG image, the second frame is the difference over time from the first frame. Therefore coding the residual error or the difference. If the difference is small from the previous image, then a high compression ratio can be achieved.
The second video compression algorithm (after Huffman encoding) is Motion Compensation, where the vectors determined from the ME step are used to describe the picture in terms of the transformation of a reference frame to the frame currently being analyzed (illustrated below). Each image is divided in NxN blocks (where the default for N = 16) and run through an algorithm called Motion Estimation. Break up of a video frame into macroblocks:

Motion Estimation determines the motion vectors which describe the transformations from one picture to the next. One part of a picture is subtracted from the same part (block) in the previous image. This scan and subtraction process is illustrated below:

Frame-Types in Video Compression

There are three main types of frames, which are grouped together in Groups of Pictures (GOP):

I-frames

Otherwise known as Independent frames, I-frames don’t need additional frames to be decoded – much like JPEG images. These frames are also the largest in size of the three and appear the least frequently in videos compared to the other frame types. The implementation schematics of I-frames do not need any motion compensation and the encoding process is the same as for the JPEG standard. The graph below illustrates this process:

The contents of the target frame are divided up into different macroblocks and a Forward Discrete Cosine Transform (FDCT) is applied. The FDCT process provides a smoothened out frequency distribution of the macroblocks in terms of pixel variations in a block. Once the pixels in the block have been converted to frequencies, the quantization step is applied to the macroblocks. In this step, the analog signal (frequency) is converted into a vector with different amplitude and size, depending on the spatial redundancy of the block. The less redundancy (more variation) in the block, the greater the size and amplitude described by the quantization vector; thus less compression is applied. Next, entropy encoding converts the amplitude in bits and the size identified in the quantization step and encodes this information in a bitstream – usually starting from the Most Significant Bit. This is called progressive mode and the image can then be decoded with successive approximation, through iterative scans. This is why when you are loading a JPEG image on the internet, it initially looks blocky and more details are added as the image loads. The average size on disk for a 320×240 I-Frame is 14.8KB.

P-frames

Otherwise known as Predictive frames, P-frames are frames that are relative to the preceding P-frame OR preceding I-frame. When Motion Compensation is applied to P frames, the best macroblock (the one with less variance) in the previous frame is used to encode the target macroblock. The graph below illustrates this process:

The implementation schematics of P-frames need Motion Compensation to find the difference between the previous frames, as illustrated below:

In addition to the steps carried out for I-frames, P-frames need few additional steps: I-frames introduce a feedback mechanism where the content of the target frame and all its macroblocks are encoded with reference to previous frames. The reference frame content is compared from the point of view of motion estimation with the current frame. This is done via Direct Quantization(DQ) and the Inverse Discrete Cosine Transform (IDCT). The differences are then encoded in the bitstream. The average size on disk for a (320×240) P-Frame is 6KB.

B-frame

Otherwise known as Bidirectional frames. B-Frames are relative to both preceding and succeeding I or P frames:

More efficient than looking at full successive pictures, B-frames divide a frame into smaller parts or blocks, like in JPEG coding.
When Motion Compensation is applied to B-frames, an encoder seeks the best reference macroblock for compression based on previous and upcoming frames. The implementation schematics of B-frames need motion compensation to find the best-suited macroblock for the target macroblock by selecting which has the least variation from the target macroblock.

All predictions are calculated using the same method, albeit more complex, as P-Frames. The implementation schematics of B-frames look like:

The B-frames implementation process is similar to the ones for I and P frames, but with the additional step of comparing successive frames. The additional step implies that B-frames will take longer to encode because you need to download and decode the extra data. Although a delay is introduced at the computation step, the result is more efficient encodes and a faster distribution process. The average size on disk for a 320×240 B-Frame is 765B.
Efficient video compressions use a mix of I, P and B frames – this mix approached is illustrated within a bitstream here:

This process is even more efficient when searching for just the right parts of the image to subtract from the previous frame.

Step-by-Step Video Compression Process Summarized

To recap:

A video is a sequence of frames
Each frame is split up into NxN macroblocks
For each NxN macroblock DCT is applied (measuring brightness changes in a block)
Quantization is applied on the resulting frequencies and then discretized
Frames are encoded based on the differences identified during Motion Estimation
There are 3 types of frames in video compression and video encoding:
1. I-Frame: Independent frames, like a full JPEG image, are coded fully and individually – without referencing any other frames. About 15Kb for an I-frame of 320×240.
2. P-Frame: Preceding frames are frames that only measure the differences from previous frames. About 6Kb for an P-frame of 320×240.
3. B-Frame: Bidirectional frames are encoded based on the differences between the preceding and successive frames. About 750B for a B-Frame of 320×240.

The post Developer Network Series: Everything you need to know about Video Compression appeared first on Bitmovin.

Dev Network Series: Everything you need to know about image compression

Andrea Fassina — Tue, 31 Mar 2020 14:43:18 +0000

Everyone knows what a jpeg is! It’s an image of course.
Yes and No – JPEG is the acronym that sits at the end of your picture file on your favorite device. You must be thinking, “Hold on one second, if it’s not the actual image, and it’s an acronym, what is it and what does it stand for?” Keep reading to find out:

Do I look like I know what a JPEG is?

You may know the meme, do you actually know? JPEG is one of the world’s most common lossy image compression standards and stands for Joint Photographic Experts Group. JPEG is defined as a working group within the International Organization for Standardization (ISO/IEC). It’s important to note that JPG and JPEG are the same, however, the former was used in Windows because file name extensions could only have three letters.
The process of compressing an image in a lossy way is to convert it from the raw format to JPEG, whereby the image is split into blocks (ex: 8×8 pixels), separated, and organized using the Discrete Cosine Transform (learn more about DCT). The image is described as a function in the spatial domain in this step of the data compression process. The effectiveness of JPEG when using DCT as a transfer encoding (effectively mapping input values to output values, using less data) is based on three major observations: saliency in an image, spatial redundancy, and visual acuity.

Deciding if it “Needs More JPEG”

To measure the effectiveness and quality of lossy compression, you need to understand these observations in detail:

Image Saliency

The focal point (or salience) of an image changes slowly across an image. it is unusual for pixel intensity values to vary multiple times in a small area, for example within an 8×8 image block. The majority of data in a pixel block is repeated, this is known as “spatial redundancy.” The same applies to the rest of the blocks in an image. That’s why it’s possible to represent an image with less data than the original content without sacrificing quality.

Spatial Redundancy

Psychophysical experiments suggest that people are much less likely to notice the loss of very high spatial frequency components than the loss of lower frequency components. For example, high-frequency components can be white pixels in consecutive image blocks on a white background, whereas low-frequency components like facial features in close up scenes are much more noticeable. Thus, high spatial frequency components are exploited by applying the DCT on them to considerably decrease the spatial redundancy in an image (and therefore in a video). The outcome of this process is a lower expenditure on bits when transmitting an image of similar quality as the original.

Visual Acuity

Visual Acuity is the measurement of a viewer’s ability to accurately distinguish closely spaced lines. For example, the difference between gray variants (black & white scale) can be distinguished with much greater accuracy and ease than color variants. Other than RGB, color in an image can be represented with three channels: one for luminance and two for chroma samples, YCbCr. This concept is applied in JPEG in the context of Chroma subsampling; which utilizes visual acuity to use more information when encoding brightness (luminance), as the human eye has a higher sensitivity towards luminance than for color differences (chrominance). The most common specification for chroma subsampling is a 4:2:2 ratio – as defined below:

Y:Cb:Cr – Y is the width in pixels of the sampling reference, Cb is the number of chroma values, and Cr is the number of changes in the chroma values.

In a 4:2:2 ratio, luminance has twice the sample rate of the chrominance. When the ratio is 4:4:4 – there is no chroma subsampling. The combined assumptions that image information varies relatively slowly, people are less likely to notice the loss of highly frequent components, and that luminance takes visual priority in human sight are the theoretical frameworks behind implementing a JPEG encoder.

Encoding images – processes defined

A full JPEG encoding process is best visualized using the following block diagram:

The process begins with a source image that’s prepared for encoding using chroma subsampling.

Discrete Cosine Transform

After the chroma subsample ratios are set, the image is split into 8×8 blocks:

Using a block diagram, each block is processed independently of the others. Forward DC is then applied individually to each of these blocks, successive image block analysis (backward DCT analyzes previous blocks). An image must be split into blocks and individually transformed, otherwise, applying high compression ratios will result in a blocky JPEG image. An encoder will vector an image using a Zig-Zag scanning method (from the top left-hand corner to bottom right-hand corner). The graphic below illustrates this process:

Applying DCT will split the image from the upper left to the lower right corner into AC and DC coefficients. DCT coefficients are further divided into corresponding values of a quantization matrix. As illustrated below:

Quantization

Applying quantization rounds up various integers, therefore introducing loss into the compression process. Higher frequency components have higher quantization table integer values. JPEG has predefined quantization tables for luminance and for chrominance components. Regardless of the content type (image, audio, or video), the goal of an encoder is to maximize the compression ratio and minimize the perceptual loss in transferred content.
By separating frequency components, lower frequency components that map to 0 in the quantization table can be filtered out; therefore increasing the compression capacity. More 0s = more compression!
Next, the image is split into even smaller blocks, any variations across these blocks are measured over time. The next step after quantization is taking the Qantas coefficients (values extracted during quantization) and creating a vector with quantized values.

Coding Coefficients

Successively, the encoder will perform a Run-length Coding (RLC) algorithm on the AC Coefficients. Run-length coding replaces frequency values with a pair of integers: the run-length value, which specifies the number of zeros in the run and the value of the next non-zero AC coefficient. Unlike AC, DC components use Differential coding after vectoring, where the first value of the DC coefficient is coded as is and the remaining values are coded as “different”. This process is illustrated below:

Once the RLC and differential codes are complete, both AC and DC coefficients undergo entropy coding. Entropy coding defines AC and DC values by size and amplitude; where size indicates the count of used and amplitude is the value of the bits.
The final coding step that the AC and DC components will pass through is the Huffman coding algorithm. Only “size” is Huffman coded as smaller sizes occur more often in an image. Amplitude is not, as its value can change so widely that Huffman coding will have no appreciable benefit. This method is applied for every block the image has been split up into.

Bitstreams

The resulting output needs to be put in a bitstream, also known as a binary sequence. There are many modes of bitstreams, JPEG uses a progressive mode. Which quickly delivers a low-quality version of the image, followed by higher quality passes. The JPEG bitstream is illustrated below:

Using a progressive mode, the most significant bits are downloaded first, followed by less significant ones. The result is an image that increases in quality as more blocks are progressively decoded from the Most Significant Bit (MSB) to the Least Significant Bit (LSB). This means that your browser is effectively doing multiple decodes as more information added over time. An example of progressive JPEG decoding over time:

Did you enjoy this post and do you want to learn more about compression algorithms?
This post is a part of our Video Developer Network, the home of free university-level courses about back-end video technology. Check the links below for relevant content:
[Landing Page] Video Developer Network
[Developer Network Lesson] Image Compression Standards
[Blog] Developer Network Series: Everything you need to know about Lossy Compression Algorithms

The post Dev Network Series: Everything you need to know about image compression appeared first on Bitmovin.

Lossy Compression Algorithms: Everything a Developer Needs to Know

Andrea Fassina — Tue, 10 Mar 2020 15:24:37 +0000

This is a comprehensive and detailed guide to Lossy Compression algorithms.

Here you will find everything a developer needs to know about this cutting-edge topic, in a a guide created by some of the most advanced video technology experts and engineers.

Take your time to read through the whole resource, or just skip to the chapter that interests you using the table of contents below.

What Are Compression Algorithms?

When it comes to content distribution, especially in the form of video, the size of the content can make or break your business.

Even standard quality content files (video, audio, and text) end up taking up a lot of space, especially as applied to the transportation and/or distribution of the file.

To alleviate the potentially extremely high cost of storage and delivery everyone uses some form of compression algorithms to reduce file size.

The use of compression is of utmost importance to your success because it reduces the file size while maintaining the same user-perceived quality. At the time of this blog post, there are two variations of compression algorithms – lossy and lossless.

The focus of this post is lossy compression.

Introduction to Lossy Compression

Lossy compression means that compressed data is not exactly the same as it was originally, but a close approximation to it.

In most cases, the human eye wouldn’t even notice the difference between an original file and one compressed in a lossy way, but it yields a much higher compression ratio than the lossless compression, where an exact copy of the content is created.

Lossy compression is one of the most important factors necessary in modern content distribution methods.

Without (lossy) compression the content we view every day wouldn’t be nearly as high quality as it actually is, and that’s just one of the pitfalls society might face without any kind of compression.

Other challenges viewers and distributors would face without (lossy) compression: slow load/buffer times, high delivery and storage costs, and limited distribution capabilities.

This blog acts as complementary material to our Video Developer Network – if you would like to learn about lossy compression algorithms in a classroom-style video format watch the video here.

What the Math?! Lossy Compression Ratios & Metrics in Digital Video

Lossy compression algorithms deliver compression ratios that are high and represent most multimedia compression algorithms in image, video, and audio content.

The goal of video and audio compression is to maximize the compression ratio and to minimize the distortion; a common trade-off in all compression technologies.

The standard formula for lossy compression algorithms is defined as “close-approximation”, measured by establishing various distortion metrics that specify how close the compressed content is to the original – the most common measures are defined below:

Perceptual Distortion

Perceptual distortion is a famous metric that has been used historically for assessing video quality. Distortion theory provides the framework to study the trade-offs between the data rate and the Distortion itself.

In the graph above: Y-axis is the data rate and X-axis the distortion level. If you have a high data rate and a zero distortion, it is a lossless compression scheme.

As soon as cost/spend limitations are considered (in the form of bandwidth and/or storage), data reduction rates will increase and image distortion will appear.

Mean Square Error

Another measure of distortion is mean square error, where is X the input data sequence, Y is output data sequence and N is the count of elements:

Peak-Signal-To-Noise Ratio (PSNR)

Then there is the Peak-Signal-To-Noise ratio (PSNR) which is calculated by comparing the size of an error relative to the peak value of a signal.

The higher the PSNR, the better the video quality. Signal-to-noise ratios are typically expressed in decibel units (dB).

A good ratio will register values of around 80db.

Having explained the metrics used to evaluate the accuracy and quality of lossy compression, it’s time to discuss how the compression process works.

Lossy Compression: the “two” step process

Step 1: Quantization

The step that adds the most distortion is quantization.

Quantization is the process of mapping input from a large set (like an analog signal) to numerical output values in a smaller (usually finite) set.

There are 3 different forms of quantization: uniform, non-uniform, vector.

Uniform scalar quantizer – subdivides the domain of the input into output values at regular intervals, with the exceptions at the two outer extremes.
Non-uniform quantizer – output values are not at equally spaced intervals. The output of the reconstructed value that corresponds to each interval is taken during quantization, the midpoint of this interval and the length of each interval is then referred to as the step size which can be denoted by a symbol.
Vector quantizer – high decoding complexity, output values can be distributed irregularly, not in a grid fashion – such as in the scalar quantizer case – because an output value represents a vector and not a scalar value.

Step 2: Transform coding

Transform coding is the second step in Lossy Compression.

Transform coding is the process of creating a quantized group of blocks (containing all pixels in a frame) of consecutive samples from a source input and converting it into vectors.

The goal of transform coding is to decompose or transform the input signal into something easier to handle.

There is a good chance that there will be substantial correlations among neighboring samples; to put it in other words, adjacent pixels are usually similar, therefore, a compressor will remove some samples to reduce file size.

The range of pixels that can be removed without degrading quality irreparably is calculated by considering the most salient ones in a block.

For example: If Y is the result of a linear transform T of the input vector X in such a way that the components of Y are much less correlated, then Y can be coded more efficiently than X.

If most information is accurately described by the first few components of a transformed vector Y, then the remaining components can be coarsely quantized, or even set to zero, with little signal distortion.

As correlation decreases between blocks and subsequent samples, the efficiency of the data signal encode increases.

Spatial frequency is one of the most important factors of transform coding because it defines how an image (and the pixels within it) change throughout playback in relation to previous and future pixel blocks.

This graphs here depicts two variations:

Spatial frequency indicates how many times pixel values change across an image block. It’s key to note – the human eye is less sensitive to higher spatial frequency components associated with an image than lower spatial frequency components.

If amplitude (learn more about frequency components metrics here) falls below a predefined threshold, it will not be detected by the average human eye.

A signal with high spatial frequency can be quantized more coarsely and therefore maintain quality at lower data rates than a signal with low spatial frequency, which will need more data to provide the user with high perceived quality.

One of the other factors is – Discrete Cosine Transform (DCT) implements the measure of motion by tracking how much image content changes corresponding to the numbers of cycles of the cosine in a block.

The DCT is part of the encoding algorithm and converts pixel values in an image block to frequency values, which can be transmitted with lower amounts of data.

DCT is lossless – apart from rounding errors – and spatial frequency components are called coefficients. The DCT splits the signal into a DC – direct current component and an AC, alternating current component.

With the IDCT or Inverse Discrete Cosine Transform, the original signal is reconstructed and can be decoded and played back.

Step 2.5: Other Transformation Formats

Wavelet
An alternative method of lossy compression is wavelet transformation; which represents a sigma with good resolution in both time & frequency and utilizes a set of functions, called wavelets to transform to decompose an input signal.

Wavelet-coding works by repeatedly taking averages and differences by keeping results from every step of different image parts, this is (almost) a multi-resolution analysis.

A wavelet transform creates progressively smaller summary images from the original, decreasing by a quarter of the size for each step. A great way to visualize wavelet coding is to consider a pyramid – stacking a full-size image, quarter-size image, sixteenth-size image, and so on, on top of each other.

The image has gone through a process of subsampling (through the wavelet transformation algorithm) decreasing the size but aiming at maintaining the quality in smaller iterations.

The image on the right in the top left quadrant has a compressed representation of the full-scale image on the left, which can be reconstructed from the smaller one by applying the wavelet coding transformation inversely.

Another example of lossy compressing a white and black image is:

2D Haar Transform

2D Haar Transform is the representation of a signal with a discrete non-differentiable (step) function – consider a function that represents on/off states of a device.

In the context of image decomposition for a simple image applying the 2D Haar Transform would look like:

The image on the left represents the pixel values of the image on the right, an 8 x 8 image.

Applying a 2D Haar Transform for the second level, yields a linear decrease of the image size:

The calculated differences and image decrease allow for the image to be compressed with less data while keeping an eye on quality.

More compression means lower quality and higher quality means lower compression.

In the case of color images, the same applies:

In short, the goal of all compression algorithms is to achieve the highest possible compression ratio. For any video distributor compression ratios come to down to cost and quality considerations.

Which trade-off will yield the highest ROI? High compression and high quality at higher costs? The opposite? Or somewhere in the middle?

That’s for you to decide!

Did you enjoy this post? Check out our Video Developer Network for the full university quality videos. (including a lesson on Lossless Compression)

Did you know?

Bitmovin has a range of VOD services that can help you deliver content to your customers effectively.

Its variety of features allows you to create content tailored to your specific audience, without the stress of setting everything up yourself. Built-in analytics also help you make technical decisions to deliver the optimal user experience.

Why not try Bitmovin for Free and see what it can do for you.

This is a comprehensive guide to lossy compression algorithms.

In this detailed post, we’ll cover the following topics:

An Introduction to Lossy Compression
Lossy Compression Ratios & Metrics in Digital Video
Perceptual Distortion
Peak-Signal-To-Noise Ratio (PSNR)
Lossy Compression: the “two” step process
- Step 1: Quantization
- Step 2: Transform coding
Wavelet Transformation
2D Haar Transform

So, if you are looking to learn about lossy compression algorithms then you are in the right place.