Christian Feldmann – Bitmovin

VVC: Open-GOP Resolution Switching

Christian Feldmann — Sat, 09 Sep 2023 00:48:59 +0000

At IBC 2023 the Fraunhofer HHI, Spin Digital and Bitmovin are presenting a paper on the practical application of a new feature that was introduced in VVC: Open-GOP resolution switching. In this blog post I want to explain what open-GOP (Group of Pictures) prediction is, what the benefits are and why with VVC open-GOP prediction can finally be used in adaptive streaming.

Closed-GOP prediction structure

Let us first look at a conventional closed-GOP prediction structure. Since nothing was decoded yet, the first frame in a bitstream is an Instantaneous Decoding Refresh (IDR) frame. If an IDR frame is received, the decoder is instantaneously reset (refreshed) and all frame buffers or other internal buffers are cleared. Since the frame has no dependencies on other frames, it can always be decoded. An IDR frame is also a Random Access (RA) point or keyframe. At RA points decoding can be started. (RAs are marked in orange).

The following frames are then encoded using predictive (P) coding. This means that they use data from the already decoded frames. This includes pixel data for motion compensation but also motion vectors or prediction modes. But let’s illustrate this with an example:

In this example we are encoding a total of 9 frames. The frames are marked from 0-9 in the order that they are displayed to the viewer. (The vertical offset of the uneven frame numbers is just for illustration purposes.) However, they are not encoded in the order that they are displayed. In this example, frame 2 uses only frame 0, which has been decoded already, for prediction. Next, frame 1 is decoded which is displayed between frames 0 and 2 and uses both frames for prediction. This so-called Bi-prediction is much more efficient than prediction only from frames in the temporal past and is a key feature that makes modern video codecs so efficient.

Of course, it is impractical to have only one keyframe at the very beginning of a video. We also want to be able to start decoding at frequent points within a bitstream. This allows us to seek in a video as well as to switch between different renditions as it is done in adaptive streaming. So, we can just insert multiple IDR frames in a video:

In this example, frame 4 is also an IDR frame. We can start decoding at frame 0 as well as at frame 4. Frames 0-3 form a Group of Pictures (GOP) which is completely self-contained and can be decoded completely independently of any other GOPs. The same is true for the following GOP of frames 4-9. As there are no dependencies between these GOPs, this is also referred to as a closed-GOP configuration.

The closed-GOP configuration is widely used in adaptive bitrate streaming applications where the ubiquitous approach is to split the video into segments of a certain length. Each segment is then encoded using a predetermined set of different resolutions and bitrates called renditions. Since every segment starts with an IDR frame, it is possible to start decoding at each segment which therefore enables seeking. Furthermore, the video player can also freely switch to any of the other renditions at every segment boundary.

Another benefit emerges at the encoder side where each long video is split into small pieces (segments). If these segments can be independently decoded, then they can also be independently encoded. And if we mention “many segments” and “independently encodable” then the next thought is “scalability” and “cloud compute”. And this is exactly the principle that the Bitmovin cloud encoder is based on. We take all these individual encoding tasks and then scale horizontally in the cloud.

Open-GOP prediction structure

So the opposite of a closed-GOP is an open-GOP configuration. The key difference is that in an open-GOP prediction structure predictions between the GOPs are allowed. Let’s again look at an example:

So, the frames 0, 1 and 2 are decoded in a hierarchical fashion as before. But then something different happens. The next frame in decoding order is frame 4 which is a random access point (RA). However, it is not an IDR but a Clean Random Access (CRA) point. While a CRA can also be decoded independently of any other frame, it does not reset the decoder as an IDR does and the reference picture buffer is not cleared. Next, we have frame 3 in coding order. As before, this frame uses frames 2 and 4 as reference. The rest of the frames are coded as before.

As in the closed-GOP example we can start the decoding process at frame 4 because it is not using any other frames as a reference. But this time, the decoder cannot be reset if this frame is received because the following frame (frame 3) uses previously decoded frames as a reference which as a result must remain in the picture buffer. The process of starting decoding from the second GOP is therefore a bit more complex:

Frame 4 is a Clean Random Access (CRA) point so decoding can be started with this frame. For the next frame in coding order (frame 3) we now have an issue. Since one of its references (frame 2) has not been decoded, frame 3 cannot be decoded. If we start decoding at the Random Access (RA) point of frame 4, the decoding of the leading picture 3 must be skipped. Consequently, the frame type is Random Access Skipped Leading (RASL). The remaining frames can be decoded as before.

So, what are the advantages of an open-GOP configuration? So far, we just observed that decoding is more complicated. Moreover, it is now also impossible to switch to a different rendition at the CRA because we have not decoded the reference frames that are needed for the leading frames of the open-GOP, and we will have to skip decoding them. But there are two substantial advantages:

Coding performance

As I already mentioned, bidirectional prediction into the temporal past and future is one of the key features that make modern video codecs so efficient. Generally, the more past and future reference frames a frame can use for prediction, the higher the compression efficiency. Frames that do not use any other frames as reference (RA frames like IDR and CRA frames) typically have the worst compression efficiency.

While we cannot avoid having regular random access points in the bitstream for seeking, we can increase the coding efficiency of the leading pictures significantly in the open-GOP configuration. This leads to a significant reduction in overall bitrate at the same quality. In the experiments from the HHI, an overall BD-rate reduction of up to 9% could be observed. Of course, these results depend on many factors like the general coding structure, the resolution and bitrate as well as the content itself.

Coding performance across segment boundaries

In a closed-GOP configuration, the decoder must be reset with every IDR frame. An unwanted side effect of this is that the quality as well as the visual representation of a scene changes very abruptly at this point. Especially at lower bitrates, this can be perceived as a sudden jump or pumping in the video. Things that are generally hard to encode like water, clouds and trees are particularly susceptible to this effect.

In this example, the difference between closed-GOP on the left and open-GOP on the right is nicely visible. The pumping is especially notable in the exhaust clouds of the rocket launch in the first scene and in the trees in the background in the second scene. In the open-GOP configuration this effect is hardly visible.

Open-GOP resolution switching

I mentioned before that switching to a different rendition is only possible at IDR frames in a closed-GOP configuration. We also saw that if we start decoding a CRA frame with RASL frames, we must skip decoding of the leading frames. Obviously, we don’t want to skip decoding frames whenever the player switches to a different resolution. This would be a horrible experience for the viewer.

Fortunately, VVC has a trick up its sleeve for exactly this scenario. In the example above we noted that decoding of the RASL frame (frame 3) is not possible because it uses frame 2 as a reference which has not been decoded when switching renditions. But what has been decoded is a different version of frame 2 from a different rendition. While this frame may have been decoded at a different quality or even at a different spatial resolution it is a representation of the exact same frame. So, with a bit of high level syntax, the VVC decoder can use this frame from another rendition as a reference frame for decoding frame 3. Even if the frame uses a different resolution, the decoder has a standardized set of up/down scaling filters. Let’s look at this:

In this example we are decoding frames 0 to 2 from a rendition at a lower resolution. Then the player decides to switch to a rendition with a higher resolution and bitrate. Decoding of the CRA (frame 4) is no problem since RA frames can be decoded independently of other frames. For frame 3, the decoder will now upscale frame 2 from the lower rendition and use this frame as a reference instead of the unavailable frame from the higher rendition. Decoding of the remaining frames is unchanged.

As mentioned before, the open-GOP prediction structure significantly reduces quality pumping effects. But there is another advantage. When switching to a higher or lower rendition in a closed-GOP configuration, there is a visible jump of the quality of the video. Of course, the bigger the jump is, the more pronounced the visible quality jump becomes. However, in open-GOP resolution switching, the intermediate leading frames that are using references from both renditions act as a sort of “quality interpolation” between the renditions which results in a much smoother transition between the renditions.

Here, we can see an example of the PSNR for the resolution switching behavior. We have 3 renditions of 1920×800 (gray), 1280×534 (red) and 640×268 (blue). In conventional adaptive streaming implementations with closed-GOPs, a rendition switch would result in an abrupt jump in quality at the switching points. The yellow graph shows how the quality has a much smoother transition between the renditions when using open-GOP resolution switching in VVC.

IBC 2023

At the IBC, we are presenting a technical paper about practical implementations and considerations when implementing open-GOP resolution switching with VVC in real world environments. This is a joint effort of Fraunhofer HHI , Spin Digital and Bitmovin. Please join us at the IBC in the “Advances in video coding and processing” session on Sep 16th starting at 14:15 in room E102. Here we will present what technical challenges arise when deploying this feature for low latency live transcoding as well as in the highly scalable Bitmovin cloud encoder.

IBC 2023 Website

VVC codec background

VVC benefits and supported devices

The post VVC: Open-GOP Resolution Switching appeared first on Bitmovin.

VVC Video Codec – The Next Generation Codec

Christian Feldmann — Thu, 03 Feb 2022 17:00:00 +0000

State of VVC Video Codec

So it’s happening. After their previous work on h.264/AVC and h.265/HEVC, the ITU-T Video Coding Experts Group (VCEG) and ISO Moving Picture Experts Group (MPEG) have again joined forces to create another video codec named Versatile Video Coding (VVC).
The first goal for the VVC Video Codec was to significantly reduce bitrate expenditure while maintaining the same visual quality compared to HEVC. When considering PSNR as a quality metric, the reference VVC encoder outperforms the reference HEVC encoder by about 40% in BD-rate. However, some subjective tests were also performed which demonstrated overall bit-savings of closer to 50%. The second goal in the standardization was versatility. On this front, VVC facilitates coding and transport for a wide range of applications and content types such as conventional video streaming, optimizations for screen content, 360-degree video, as well as live and ultra-low delay applications.

Evolution or Revolution?

Since the development of VVC was started from the basis of HEVC, the first question to ask about VVC is: Is it an evolution based on the technologies that were used in the former coding standards, or is it a really new and revolutionary way of compressing video?
Answer: It’s more or less an evolution of the basic building blocks that were already used in HEVC and various other codecs before.

It is still a hybrid, block-based video coding standard
Most technologies are based on HEVC and are further refined and improved
But there are also a lot of new coding tools which have not been seen in the context of video coding

But how was this achieved? Like other video coding standards (e.g. AVC, HEVC, or AV1), VVC is based on the hybrid block-based video coding approach with conventional intra and inter prediction. As with former evolutions of video coding standards, these gains can not be attributed to one single technique that was added in VVC but to smaller improvements in all the building blocks of the coding scheme. So here is a (not even close to complete) list of advancements in VVC:

The maximum block size of Coding Tree Units (CTU) that can be processed was increased. The maximum block size is now 128×128 pixels. Also, the maximum block sizes for intra prediction and transformations were increased. This is particularly beneficial as the resolution of content that is encoded also is increasing.
After the initial split, each CTU is further split into Coding Units. This splitting algorithm is now much more flexible and allows for more different block sizes both square and non-square.
The number of directions that can be used in directional intra prediction was further increased. Intra prediction can now also be performed for non-square blocks.
Many aspects of motion compensated prediction (inter prediction) were improved as well like better motion vector prediction, decoder side motion vector refinement, and overlapped block motion compensation (OBMC).
More different types of transformation are available by using a separable transform combining Discrete Cosine and Sine Transform.
The in-loop filters were improved as well with the addition of a new filter – Adaptive Loop Filter – for which the encoder can signal optimal parameters on a CTU basis.

As I already mentioned this list is far from complete and there are many many more new and adapted technologies that make VVC so efficient. If you want to learn more then there is plenty of material out there. But a good place to start is this very detailed overview paper.

What’s new in VVC?

So while the standard was finalized in late 2020, there is no widespread use of VVC in the market yet. But this was also not to be expected after such a short time. History has shown that adoption time for new video codecs is long and usually follows the following scheme: First, devices must support playback of a new standard. While software decoding on devices with high compute capabilities and no restrictions on power usage can be implemented quickly, most consumption of media is performed on devices that rely on hardware decoders. And while it takes some time to develop and deploy new decoding hardware, it takes even longer until these new devices reach a critical mass of deployment “in the wild”. At the same time, encoding solutions must be developed, tuned, and deployed. And finally, there is the question of royalties that must be paid. And only when all of these issues are resolved will it make sense to deploy actual video streaming using the new video codec.
For VVC, we are seeing the first practical implementations of VVC. There are some software-based encoder and decoder solutions and the first vendors have released hardware decoders in their System on a Chip (SoC) devices. Furthermore, some vendors (like Bitmovin) have deployed VVC Video Codec as an option in their transcoding as a service product. On the patent side, there are a few players moving the codec into production, with MPEG LA introducing the first license in early 2021. While some patent pools are forming it is still very much unknown how much the usage of VVC will cost. And in the face of the falling cost of bandwidth, this is the biggest problem of VVC. To put it simply: If the price is not worth the bitrate savings, it will not be a thing.
So as I mentioned, Bitmovin already deployed VVC video encoding in its cloud-based transcoding solution. For this, we teamed up with the Fraunhofer Heinrich-Hertz-Institut (HHI) to integrate their software-based encoder VVenC which is an open source VVC encoder and is freely available on Gihub. While this is working great, there was no easy way to create a VVC Video and watch it. This is where the vvDecPlayer project comes in.

Introducing the VVC Video Player: BitvvDecPlayer

What I wanted to create was a simple demonstration player that was able to stream and decode a VVC video stream in real-time. All parts of the decoder are based on other open-source projects. Many of the rendering routines were copied from the YUView player (another project of mine) which in turn is using the Qt framework.
When opening a playlist in the vvDecPlayer, four threads are launched that build up the decoding pipeline:

Download: This thread performs the download of the VVC video segments using HTTP. It has an internal buffer of 5 segments that it will try to keep full. The actual download was implemented using the Qt network module.

Bitstream parsing: After the download is done, we perform parsing of the high-level syntax of the bitstream. This gives us information about the segment like resolution and more importantly the number of frames in the segment.

Decode: The decode thread decodes the compressed bitstream into raw YUV frames. We are using the Fraunhofer VVdeC software decoder here. The decoder is quite fast and is able to do real-time decoding of UHD content, provided that enough CPU power is available. The decoded frames are all stored in temporary buffers.

Conversion: While the decoded frames are in the YUV domain, we require RGB pixel data for display. This is done with a native C++ function that was copied from YUView.

Finally, there is a timer running in the main thread that is trying to update the screen ‘FPS’ times per second by drawing the next converted RGB pixel buffer to the screen.

VVC Video Player in Action

The screenshot shows the player in action. On the top left, the available renditions are shown. The currently selected rendition is marked with an arrow and the currently visible rendition is highlighted in green. Use the up/down arrows to switch renditions. On the top right are the fps counter and the status of the threads. The thread status display can be enabled in the menu or by pressing Ctrl+D. On the bottom is the progress graph (Ctrl+P). This is showing many values from the decoder pipeline. The dark cyan blocks indicate the compressed data of each segment where the height corresponds to the bitrate of each segment. Within each block, there is one bar that indicates the bitrate per frame. On the bottom, the status of each frame is shown: Downloaded (grey), Decoded (blue), converted to RGB, and ready for display (green). Playback can also be paused using the space bar and playback can be switched to full-screen view by double-clicking the video.
If you want to give it a try check out the project on Github. The only prerequisites you need are a compiler, CMake, and the Qt libraries. How to build really depends on the platform that you are compiling for. But it goes something like this:

Get Qt. Either from the Qt page or if you are on Linux then probably your distro’s package manager can install it for you. On the mac homebrew is a good option.
Check out the source code of the player, and create a new directory to build in (e.g. ‘build’). Go into the directory and call qmake: ‘qmake ../’. After that, you start compilation. On Linux/mac this is probably ‘make’, while on windows it is likely ‘nmake’.
Next, you also need the VVdeC decoder library. Compiling that is also easy. Get the sources from the Github repo and create a build directory. In there, call ‘cmake -DBUILD_SHARED_LIBS=1 ..’ and then ‘cmake -–build . –config Release’. This should build the shared decoder library in the ‘bin’ or the ‘lib’ folder.
Lastly, you can start the player. Go to ‘Settings->Select VVdeC library’ and browse to the shared VVdeC library you just built.

Now everything should be ready to stream some videos. The player comes with some sample stream provided by us and our encoder. Just select a sample from ‘File->Bitmovin Streams’. If you encounter a bug please feel free to open a bug report on Github. To learn more about the BitvvDecPlayer, view our Webinar with Fraunhoffer HHI.

Conclusion

There is no doubt that VVC has some exciting potential and is already showing interesting results. Two years is a long time, and there is a lot of opportunities to further increase the coding performance and lower the encoding complexity, as well as perfect some of the new tools that are already adopted into the new standard. It will be exciting to check in again in 6 to 12 months to see how the implementations are developing.
If you like to learn more about how the VVC Video Codec works check out our introductory blog: What is VVC and how does it work
Open projects like these are a staple of the benefits of working as an Engineer for Bitmovin. Come join our team to work on your exciting standard-setting projects.

Video technology guides and articles

Back to Basics: Guide to the HTML5 Video Tag
What is a VoD Platform?A comprehensive guide to Video on Demand (VOD)
Video Technology [2022]: Top 5 video technology trends
HEVC vs VP9: Modern codecs comparison
What is the AV1 Codec?
Video Compression: Encoding Definition and Adaptive Bitrate
What is adaptive bitrate streaming
MP4 vs MKV: Battle of the Video Formats
AVOD vs SVOD; the “fall” of SVOD and Rise of AVOD & TVOD (Video Tech Trends)
MPEG-DASH (Dynamic Adaptive Streaming over HTTP)
Container Formats: The 4 most common container formats and why they matter to you.
Quality of Experience (QoE) in Video Technology [2022 Guide]

The post VVC Video Codec – The Next Generation Codec appeared first on Bitmovin.

HEVC vs VP9: The Battle of the Video Codecs

Christian Feldmann — Wed, 05 Aug 2020 13:14:07 +0000

For an APAC live event, our video coding engineer Christian Feldmann compared the HEVC (H.265) vs VP9.
During the session, we discussed the fundamental differences between the two “modern codecs” and tied it off with an early analysis of each codec’s performance.
These results were obtained using the open-source encoders libvpx-vp9, x264, and x265.
This article delves into that experiment and shares the results of Christian’s research.

VP9 vs HEVC: The encoding setup

Software

For the test I used the following:

libvpx-vp9 encoder (version 1.8.2) for VP9 encoding
x264 encoder (tag235ce6130168f4deee55c88ecda5ab84d81d125b) for h.264/AVC encoding
x265 encoder (version 3.2) for h.265/HEVC encoding.

I also compiled libvmaf (version 1.5.1) and ffmpeg (version 4.2.3) to run the encoders and perform PSNR, SSIM and VMAF measurements.
If you want to recreate the same execution environment: I used Docker to build it so you can recreate the exact same environment using my Dockerfile which can be found here.

Test set

For the test set I used Full HD and 4K sequences from the JEVT SDR test set [1] which was also used in the standardization of VVC.
Some of these sequences are well known and were already used in several prior standardization activities. All sequences are 10 seconds long and used in YUV 4:2:0 subsampling.
The sequences are as follows:

JEVT SDR HD & 4K Sequences

Encoding

For encoding, I used default settings with ffmpeg. All encodings implemented 2-pass encoding with a set target bitrate. The corresponding ffmpeg calls look like this:

ffmpeg -i input.yuv -c:v libx264 -preset veryslow -b:v br --pass 1/2 enc.mp4
ffmpeg -i input.yuv -c:v libx265 -preset slow -b:v br --pass 1/2 enc.mp4
ffmpeg -i input.yuv -c:v libvpx-vp9 -b:v br --pass 1/2 enc.mp4

Presets

I used the following presets for each encoder:

x264 – very slow
X265 – slow
libvpx-vp9 no preset was chosen (which corresponds to a cpu-used value of 1)

These settings were chosen from experience. While they do not yield the highest possible compression performance, they correspond to a very high quality encode with a good trade-off between encoding time and quality.
The encodings were performed under two scenarios:

Fixed Resolution: no scaling is applied. Encoding was performed at the resolution of the original sequence with various different target bitrates. The bitrates for x265 and libvpx-vp9 were: 4.8 Mbit/s, 2.4 Mbit/2, 1.8 Mbit/2, 1.2 Mbit/s and 0.8 Mbit/s. For x264, these values were multiplied by a factor of two. Based on the pixel count, another factor of four was applied to the bitrates for the 4K encodes.
Bitrate Ladder: Encoding was performed at a range of different resolutions and bitrates also referred to as a bitrate ladder. These were: 1920×1080 at 4.8Mbit/s, 1920×1080 at 2.4Mbit/s, 1280×720 at 1.8Mbit/s, 1280×720 at 1.2Mbit/s, 854×480 at 0.8 Mbit/s, 640×360 at 0.4 Mbit/s and 426×240 at 0.2 Mbit/s. For the 4K encodes, two additional points with 3840×2160 at 19.2 Mbit/s and 9.6 Mbit/s were added. As in the first scenario, the bitrates were multiplied by a factor of two for x264. The final measurement step was performed after upsampling back to the resolution of the original source. The default scaling algorithm is bicubic.

Results

For each encoding, multiple different measurements were performed.
In the cases where encoding was performed at a lower spatial resolution, the measurement was performed after upscaling the reconstruction back to the resolution of the source.
PSNR and SSIM measurements were performed for the three components (Y/U/V) as well as an averaged value. VMAF was calculated as well. For the 4k source files, the 4k VMAF model was applied.
For the encoding time, I measured the absolute elapsed time as well as the CPU time per thread. This is a sample plot for the sequence “MarketPlace” in the fixed resolution scenario (hover over image to zoom):

Fixed resolution results for the Sequence MarketPlace.

Coding performance

For both scenarios I calculated BD-rate results for the average PSNR, average SSIM, and VMAF values relative to x264 [2]:

Averaged BD-Rate results for PSNR, SSIM, and VMAF compared to x264 for both scenarios.

As one can see the libvpx-vp9 encoder is able to compete with x265 very well when it comes to coding performance.
However, the PSNR and SSIM based BD values are consistently higher for libvpx-vp9 and the VMAF-based BD-rate values are higher for x265 in the fixed resolution scenario.
In the bitrate ladder scenario, both encoders show very similar results.
What is surprising is that the default x265 configuration seems to use a much lower QP for the color components (U/V) compared to the other two encoders. However, because of the way the average values are calculated, this does not have a huge impact on the BD results.

VP9 vs HEVC Complexity Levels

For all encodings, I also measured the overall runtime of the encoding as well as the CPU-time per thread. Both of these values can give us an indication of how well the encoders can utilize multiple cores.
All tests were performed on an Intel 6 core (12 thread) processor. The results for x265 and libvpx-vp9 were taken relative to the values of x264 and then averaged.
The following table displays the relative factors compared to x264:

Runtime factors relative to x264 for absolute runtime and CPU time.

While both x265 and libvpx-vp9 have higher runtimes compared to x264, we can see that x265 is much better at utilizing available threads efficiently, which results in much lower values for the overall runtime factors.
When it comes to the CPU time, libvpx-vp9 has an advantage over x265 in the tested configuration. Similar observations can be made of the table above.
So depending on your application this may be a disadvantage or not. For example, since our encoder uses multiple vectors to utilize the available threads efficiently this behavior is not a big disadvantage for us.

Files

Finally, I would like to provide all the files needed in order to recreate the results.
Furthermore, the archive includes all the result files that were used to determine my findings. I encourage everybody to double-check them.
However, for legal reasons, I can not provide the encoded video sequences or the original uncompressed YUV test sequences.
The archive includes the following scripts which should be helpful:

Test shell scripts: These scripts were used to perform the encoding and the measurements in the docker container. Please feel free to use these in your own tests.
Python scripts: The python scripts were used to calculate the BD results (calculateBDResults.py), to plot the measured values per sequence (plotResults.py) and to plot the results per frame (plotPerFrameResults.py). Please use these scripts to take a detailed look at the results. You require python 3 and matplotlib installed. Each script must be called with the name of a sub-folder that should be plotted.

File:
https://drive.google.com/file/d/1wbUA56vB-LeH2H8nV-EGzhJPWW7ikkwx/view?usp=sharing

Summary

While this is just a quick and superficial encoder comparison, I tried to keep it close to practical applications. From the VP9 vs HEVC test here, libvpx-vp9 is able to take on x265 when it comes to coding performance.
Please note that only these encoders were tested and there are other AVC, HEVC, and VP9 encoders out there which may perform better.
If you have additional inputs to the test please reach out to me! I am very willing to run this again using a different set of settings.

References

[1] – A. Segall, E. François, W. Husak, S. Iwamura, D. Rusanovskyy – JVET common test conditions and evaluation procedures for HDR/WCG video – JVET-P2011
[2] – Gisle Bjontegaard – Calculation of average PSNR differences between RD-curves – VCEG-M33 Austin, Texas, USA, 2-4 April 2001

Did you know?

Bitmovin has a range of VOD services that can help you deliver content to your customers effectively.
Its variety of features allows you to create content tailored to your specific audience, without the stress of setting everything up yourself. Built-in analytics also help you make technical decisions to deliver the optimal user experience.
Why not try Bitmovin for Free and see what it can do for you.

The post HEVC vs VP9: The Battle of the Video Codecs appeared first on Bitmovin.

Best Video Codec: An Evaluation of AV1, AVC, HEVC and VP9

Christian Feldmann — Fri, 20 Mar 2020 19:59:40 +0000

Did you know our video player guarantees playback quality on any screen through our modular architecture, including low-latency, configurable ABR and Stream Lab, the world’s first stream QoE testing service? Check out the Bitmovin Player to learn more.

This scientific evaluation puts AV1 to the test against industry standard codecs and shows that AV1 is able to outperform VP9 and even HEVC by up to 40%

Introduction

For practical Over-the-top (OTT) streaming applications it is mostly necessary to supply streams using multiple different video codec standards in order to stream to a wide range of devices and platforms.

The most commonly used video codes in this scenario are AVC, VP9 and HEVC. With the standardization of AV1, another modern video coding standard is joining in.

While AVC offers the best compatibility across devices and platforms, the newer standards such as HEVC and AV1 offer a much higher compression efficiency and thereby also a better user experience.

Another key difference between the codecs is that VP9 and AV1 were developed with the goal of being open source and freely available for anybody to implement and use without any royalties while AVC and HEVC require a royalty to be paid.

The multi-codec dataset presented here adopts the aforementioned standards in a practical OTT adaptive streaming scenario. The full dataset is freely available online (http://www.itec.aau.at/ftp/datasets/mmsys18/). For an in-depth description of the dataset, please reference (https://arxiv.org/abs/1803.06874).

The Dataset

Since the main focus is on an HTTP Adaptive Streaming (HAS) dataset, we adopted a set of bitrate/resolution pairs – referred to as the bitrate ladder – with a range from very low bitrates/resolutions of 100 kbits at 256×144 pixels up to 4k resolutions at 20 megabits.

This is a well-established approach for OTT streaming applications.

For the video sequences, we tried to cover a range of video sequences with different properties. For this, we calculated the spatial and temporal information so that the sequences contain different amounts of motion and texture.

For the adaptive streaming encoding, a size per segment of 2, as well as 4 seconds, was used.

For AV1 encoding a snapshot of the reference software was used (v0.1.0-7691-g84dc6e9). For the encoding, the cpu_used preset was set to 2.

The encoding for AVC, HEVC, and VP9 was performed utilizing ffmpeg and, thus, libx264, libx265, and libvpx-vp9 are used. For these codecs, encoding performed with the slow preset. For all codecs, a two-pass scheme is employed.

Encoding of the AV1 bitstreams according to these specifications was performed by the Institute of Information Technology at the Alpen-Adria Universität Klagenfurt. Encodings using the other codecs AVC, HEVC, and VP9 was carried out by Bitmovin using the Bitmovin Video Encoding cloud infrastructure.

All bitstreams were then collected and jointly evaluated.

The Evaluation

For evaluation, the reconstruction at lower resolutions was upscaled to the original resolution and the weighted PSNR relative to the original source was calculated ((6*Y+U+V)/8).

From these values we calculated the corresponding Bjøntegaard-Delta bit-rate (BD-rate) values.

When calculated over the entire bitrate ladder, we were able to observe an average bitrate reduction of AV1 compared to VP9 of 13% and compared to HEVC of 17%.

When we focus on the higher part of the bitrate ladder, the BD-rate reduction compared to VP9 increases to 22%-27% while compared to HEVC, the reduction increases to 30%-43%.

It should be noted that because of the fixed bitrate ladder, the overlap becomes rather small for the highest resolutions in some sequences and the results should therefore be interpreted with some caution.

This could definitely be improved by adapting the bitrate ladder to the properties of the different sequences.

Conclusion

The dataset is meant to offer a first HLS set environment for the emerging video coding standard AV1 and the other in OTT applications most frequently used codecs AVC, VP9 and HEVC.

The coding performance results for this test set indicate, that AV1 is able to outperform VP9 and even HEVC by up to 40%.

Please note that this evaluation primarily targets HAS services and has a very specific setup.

While it can give an indication on the coding performance of AV1, the results should be interpreted with caution.

Video technology guides and articles

Back to Basics: Guide to the HTML5 Video Tag
What is a VoD Platform?A comprehensive guide to Video on Demand (VOD)
Video Technology [2023]: Top 5 video technology trends
HEVC vs VP9: Modern codecs comparison
What is the AV1 Codec?
Video Compression: Encoding Definition and Adaptive Bitrate
What is adaptive bitrate streaming
MP4 vs MKV: Battle of the Video Formats
AVOD vs SVOD; the “fall” of SVOD and Rise of AVOD & TVOD (Video Tech Trends)
MPEG-DASH (Dynamic Adaptive Streaming over HTTP)
Container Formats: The 4 most common container formats and why they matter to you.
Quality of Experience (QoE) in Video Technology [2023 Guide]

The post Best Video Codec: An Evaluation of AV1, AVC, HEVC and VP9 appeared first on Bitmovin.

State of Compression: What is VVC and how does it work?

Christian Feldmann — Fri, 14 Feb 2020 17:45:49 +0000

What is VVC?

Versatile Video Coding (VVC) is the most recent international video coding standard which was finalized in July of 2020. It is the successor to High-Efficiency Video Coding (HEVC) as it was also developed jointly by the ITU-T and ISO/IEC.
So what is really new in VVC? Is this a real revolution when it comes to video coding? In short: No. While it is technically highly advanced, it is only an evolutionary step forward from HEVC. It still uses the block-based hybrid video coding approach, an underlying concept of all major video coding standards since h.261 (from 1988). In this concept, each frame of a video is split into blocks and all blocks are then processed in sequence.
The decoder processes every block in a loop, which starts with entropy decoding of the bitstream. The decoded transform coefficients are then put through an inverse quantization and an inverse transform operation. The output, which is an error signal in the pixel domain, then enters the coding loop and is added to a prediction signal. There are two prediction types. Inter Prediction, which copies blocks from previously coded pictures (motion compensation), and Intra Prediction, which only uses decoded pixel information from the picture being decoded. The output of the addition is the reconstructed block that is put through some filters. This usually includes a filter to remove blocking artifacts that occur at the boundaries of blocks, but also more advanced filters can be used. Finally, the block is saved to a picture buffer so it can be output on a screen once decoding is done and the loop can continue with the next block.
At the encoder side, the situation is a little more complex as the encoder has to perform the corresponding forward operations, as well as the inverse operations from the decoder to obtain identical information for prediction.

The generalized block diagram of a hybrid video decoder.

Although VVC also uses these basic concepts, all components have been improved and/or modified with new ideas and techniques. In this blog post, I will show some of the improvements that VVC yields. However, this is only a small selection of new tools in VVC as a full list of all details and tools could easily fill a whole book (and someone else probably already started writing one).

VVC Coding structure

Slices Tiles and Subpictures

As mentioned above, each frame in the video is split into a regular grid of blocks. In VVC the size of these so-called Coding Tree Units (CTU) was increased from 64×64 in HEVC to 128×128 pixels. Multiple blocks can be arranged into logical areas. These are defined as Tiles, Slices, and Subpictures. Although these techniques are already known from earlier codecs, the way they are combined is new.

The picture is split into four tiles of equal size (blue). There are four slices (green). The one on the left contains two tiles. On the top right, the tile is split into two slices. CTUs are marked in grey.

The key feature of these regions is that they are also logically separated in the bitstream and enable various use-cases:

Since each region is independent, both the encoder and the decoder can implement parallel processing.
A decoder could choose to only partially decode the regions of the video that it needs. One possible application is the transmission of 360 videos where a user is only able to see parts of a full video.
A bitstream could be designed to allow the extraction of a cropped part of the video stream on the fly without re-encoding. [JVET-Q2002]

Block Partitioning

Let’s go back to 128×128 CTU blocks. As I mentioned before, the coding loop is traversed for each block. However, processing only full 128×128 pixel blocks would be very inefficient, so each CTU is flexibly split into smaller sub-blocks and the information on how to split it is encoded into the bitstream. The encoder can choose the best division of the CTU based on the content of the block. In a rather uniform area, bigger blocks are more efficient. Whereas in areas with edges or more detail, smaller blocks are typically chosen. The partitioning in VVC is performed using two subsequent hierarchical trees:

Quaternary tree: There are two options for each block. Do not split the block further or split it into four square sub-blocks of half the width and half the height. For each sub-block, the same decision is made again in a recursive manner. If a block is not split further, the second three is applied.
Multi-type tree: In the second tree, there are multiple options for each block. It can be split in half using a single vertical or horizontal split. Alternatively, it can be split vertically or horizontally into three parts (ternary split). As for the first tree, this one is also recursive and each subblock can be split using the same four options again. The leaf nodes of this tree that are not split any further are called Coding Units (CUs) and these are processed in the coding loop.

Each block is split into two stages. First using a hierarchical binary tree (left) and secondly using a hierarchical ternary tree (right).

The factor that distinguishes VVC from other video codecs is the high flexibility of block sizes and shapes that a CTU can be split into. With this, an encoder can flexibly adapt to a wide range of video characteristics that result in better coding performance. Of course, this high flexibility comes at a cost. The encoder must consider all possible splitting options which require more computation time. [JVET-Q2002]

Block Prediction

Intra Prediction

In intra prediction, the current block is predicted from already decoded parts of the current picture. To be more precise, only a one-pixel wide strip from the neighborhood is used for normal intra prediction. There are multiple modes on how to predict a block from these reference pixels. Well-known modes that are also present in VVC are Planar and DC prediction as well as Angular Prediction. While the number of discrete directions for the angle was increased from 33 to 65 in VVC, not much else changed compared to HEVC. So, let’s concentrate on tools that are actually new:

Wide Angle Intra Prediction: Since prediction blocks in VVC can be non-square, the angels of certain directional predictions are shifted so that more reference pixels can be used for prediction. Effectively this extends the directional prediction angles to values beyond the normal 45° and below -135°. [JVET-P0111]
Cross-component Prediction: In many cases (e.g. when there is an edge in the block) the luma and chroma components carry very similar information. In cross-component prediction, this is exploited by direct prediction of the chroma components from the reconstructed luma block using a linear combination of the reconstructed pixels with two parameters: a factor and an offset where the factors are calculated from the intra reference pixels. If necessary, scaling of the block is performed as well. [JVET-Q2002]

Multi Reference Line Prediction: As mentioned before, only one row of neighboring pixels is used for intra prediction. In VVC, this restriction is relaxed a bit so that prediction can be performed from two lines that are not directly next to the current block. However, there are several restrictions to this as only one line can be used at a time and no prediction across CTU boundaries is allowed. These limitations are necessary for efficient hardware implementations. [JVET-L0283]

In traditional intra prediction, only one line (line 0) is used for prediction of the current block. In Multi Reference Line Prediction this constraint is relaxed the lines 1 or 3 can be used for prediction as well.

Of course, this list is not complete and there are several more intra prediction schemes which further increase the coding efficiency. The method of intra mode prediction and coding of the mode was improved and refined as well.

Inter prediction

For inter prediction, the basic tools from HEVC were carried over and adapted. For example, the basic concepts of uni- and bi-directional motion compensation from one or two reference pictures are mostly unchanged. However, there are some new tools that haven’t been used like this in a video coding standard before:
Bi-directional optical flow (BDOF): If a prediction block uses bi-prediction with one of the references in the temporal past and the second one in the temporal future, BDOF can be used to refine the motion field of the prediction block. For this, the prediction block is split into a grid of 4×4 pixel sub-blocks. For each of these 4×4 blocks, the motion vector is then refined by calculating the optical flow using the two references. While this adds some complexity to the decoder for the optical flow calculation, the refined motion vector field does not need to be transmitted and thus the bitrate is reduced. [JVET-J0024]
Decoder side motion vector refinement: Another method that allows for the motion vectors to automatically be refined at the decoder without the transmission of additional motion data is to perform an actual motion search at the decoder side. While this basic idea has been around for a while, the complexity of a search at the decoder side was always considered too high until now. The process works in three steps:

First, a normal bi-prediction is performed, and the two prediction signals are weighted into a preliminary prediction block.
Using this preliminary block, a search around the position of the original block in each reference frame is performed. However, this is not a full search as an encoder would perform it, but a very limited search with a fixed number of positions.
If a better position is found, the original motion vector is updated accordingly. Lastly, bi-prediction with the updated motion vectors is performed again to obtain the final prediction. [JVET-J1029]

Geometric Partitioning: In the section about block partitioning it was shown how each CTU can be split into smaller blocks. All of these splitting operations only split rectangular blocks into smaller rectangular blocks. Unfortunately, natural video content typically contains more curved edges that can only poorly be approximated using rectangular blocks. In this case, Geometric Partitioning allows the non-horizontal splitting of a block into two parts. For each of the two parts, motion compensation using independent motion vectors is performed and the two prediction signals are merged together using a blending at the edge.

Some example splits using geometric partitioning.

In the current implementation, there are 82 different geometric partition modes. They are made up of 24 slopes and 4 offset values for the partition line. However, the exact number of modes is still under discussion and may still change. [JVET-P0884, JVET-P0085]
Affine motion: Conventional motion compensation using one motion vector can only represent two-dimensional planar motion. This means that any block can be moved on the image plane in x and y directions only. However, in a natural video, strictly planar motion is quite rare and things tend to move more freely (e.g. rotate and scale). VVC implements an affine motion model that uses two or three motion vectors to enable motion with four or six degrees of freedom for a block. In order to keep the implementational complexity low, the reference block is not transformed on a pixel basis, but a trick is applied to reuse existing motion compensation and interpolation methods. The prediction block is split into a grid of 4×4 pixel blocks. From the two (or three) control point motion vectors, one motion vector is calculated for each 4×4 pixel block. Then, conventional two-dimensional planar motion compensation is performed for each of these 4×4 blocks. While this implementation is not a truly affine motion compensation it is a good approximation and allows for very efficient implementation in hard- and software. [JVET-O0070]

For every 4×4 subblock, an individual motion vector (green) is calculated from the control point motion vectors (blue). Then, conventional motion compensation is performed per 4×4 block.

Transformation and Quantization

The transformation stage went through some major refactoring as well. Rectangular blocks that were introduced by the ternary split are now supported by the transformation stage by performing the transform for each direction separately. The maximum transform block size was also increased to 64×64 pixels. These bigger transform sizes are particularly useful when it comes to HD and Ultra-HD content. Furthermore, two additional types of transform were added. While the Discrete Cosine Transform in variant 2 (DCT-II) is already well known from HEVC, one further variant of the DCT (the DCT-VIII) was added, as well as one Discrete Sine Transform (DST-VII). Depending on the prediction mode, an encoder can choose different transforms depending on which one works best.
The biggest change to the Quantization stage is the increase in the maximum Quantization Parameter (QP) from 51 to 63. This was necessary as it was discovered that even at the highest possible QP setting, the coding tools of VVC worked so efficiently that it was not possible to reduce the bitrate and quality of certain encodes to the needed levels.
One more really interesting new tool is called Dependent Quantization. The purpose of the quantization stage is to map the output values from the transformation, which are continuous, onto discrete values that can be coded into the bitstream. This operation inherently comes with a loss of information. The coarser the quantization is (the higher the QP value is), the more information is lost. In the figure below, a simple quantization scheme is shown where all values between each pair of lines are quantized to the value of the marked blue cross. Only the index of the blue cross is then encoded into the bitstream and the decoder can reconstruct the corresponding value.

Basic quantization. Each vertical line marks a decision threshold. All values between the two thresholds are quantized to one reconstruction value. The reconstruction values are marked with blue crosses.

Typically, only one fixed quantization scheme is used in a video codec. In Dependent Quantization, two of these quantization schemes are defined with slightly shifted reconstruction values.

In Embedded quantization two sets of reconstruction values are used. The decoder automatically switches between these based on the previously decoded values.

Switching between the two quantizers happens implicitly using a tiny state machine that uses the parity of the already coded coefficients. The encoder can then switch between the quantizers by deliberately changing some of the reconstruction values. Finding the optimal place for this switch where the introduced error is lowest, and the switch gives the most gain can be performed using a rate-distortion trade-off. In some manner, this is related to Sign Data Hiding (used in HEVC) where also information is “hidden” in other data. [JVET-K0070]

Other

All tools discussed so far were built and optimized for the coding of conventional natural two-dimensional video. However, the word `versatile` in its name indicates that VVC is meant for a wide variety of applications. And indeed VVC includes some features for more specific tasks which make it very versatile. Former codecs typically put these specialized tools into separate standards or separate extensions. One such tool is the Horizontal Wrap Around Motion Compensation. A widespread method of transmission of 360° content is to map the 360° video onto a 2D plane using equi-rectangular projection. The 2D video can then be encoded using conventional 2D video coding. However, the video has some special properties which can be used by the encoder. One property is that there is no left or right border in the video. Since the 360° view wraps around, this can be used for motion compensation. So when motion compensation from outside of the left boundary is performed, the prediction wraps around and uses pixel values from the right side of the picture.

Prediction from outside of the left side of the issue will wrap around and use pixels from the right side of the picture as well.

While this tool increases the compression performance it also helps to improve the visual quality since normal video codecs tend to produce a visible edge at the line where the left and right side of the 2D video are stitched back together. [JVET-L0231]
Another application of video coding is the coding of computer-generated video content, also referred to as screen content. This type of content usually has some special characteristics like very sharp edges and very homogeneous areas which are atypical for natural video content. One very powerful tool in this situation is Intra Block Copy which performs a copy operation from the already decoded area of the same frame. This is very similar to motion compensation with the key difference that the signalled vector does not refer to a temporal motion but just points to the source area in the current frame for the copy operation. [JVET-J0042]

Coding performance

With every Standardization meeting, the VVC test model software (VTM) is updated and a test is run to compare the latest version of VTM to the HEVC reference software (HM). This test is purely objective using PSNR values and the Bjøntegaard delta. While multiple different configurations are tested, we will focus on the so-called Random-Access configuration which is the most relevant when it comes to video transmission and streaming.

BD-rate comparison of VTM 7.0 compared to HM 16.20. [Q0003]

In terms of BD-rate performance, VTM is able to achieve similar PSNR values while reducing the required bandwidth by roughly 35%. While the encoding time is not a perfect measure of complexity it can give a good first indication. The complexity of VVC at the encoder side is roughly 10 times higher, while the decoder complexity only increases by a factor of 1.7. Please note that these results are all based on PSNR results. It is well known that PSNR values are not that well coupled to the actual subjectively perceived quality and some preliminary experiments show that the subjective results seem to be higher than 35% bitrate reduction. A formal subjective test is planned for later this year.

Conclusion

So after all of this technical detail what is the future of VVC going to be? From a technical side, VVC is the most efficient and advanced coding standard that money can buy. However, it is unknown as of yet how much it will really cost. Once the standardization process is officially finished in October 2020, the process to establish licensing terms for the new standard can be started. From previous standards, we have learned that this is a complicated process that can take a while. At the same time, there are other highly efficient codecs out there for which applications and implementations are maturing and evolving.

Links and more information

The JVET standardization activity is a very open and transparent one. All input documents to the standardization are publicly available here. Also, the reference encoder and decoder software are publicly available here.

Bitmovin & Standardization

Bitmovin is heavily involved in the standardization process around back-end vidtech; this includes our attendance and participation in the quarterly MPEG meetings, as well our membership and involvement in AOMedia.

Video technology guides and articles

Back to Basics: Guide to the HTML5 Video Tag
What is a VoD Platform?A comprehensive guide to Video on Demand (VOD)
Video Technology [2022]: Top 5 video technology trends
HEVC vs VP9: Modern codecs comparison
What is the AV1 Codec?
Video Compression: Encoding Definition and Adaptive Bitrate
What is adaptive bitrate streaming
MP4 vs MKV: Battle of the Video Formats
AVOD vs SVOD; the “fall” of SVOD and Rise of AVOD & TVOD (Video Tech Trends)
MPEG-DASH (Dynamic Adaptive Streaming over HTTP)
Container Formats: The 4 most common container formats and why they matter to you.
Quality of Experience (QoE) in Video Technology [2022 Guide]

The post State of Compression: What is VVC and how does it work? appeared first on Bitmovin.

Cool New Video Tools: Five Encoding Advancements Coming in AV1

Christian Feldmann — Thu, 01 Mar 2018 20:20:25 +0000

Now that AV1 has entered its final stage of development and is getting close to finalizing its features, it’s a perfect time to take a closer look at what’s in store for the future of video streaming. With Apple announcing their decision to join the Alliance for Open Media in January, practically all major tech leaders are on board and AV1 looks to be in good shape for becoming a widespread standard in the near future. Learn what video encoding advancements are coming in this new open codec in the upcoming webinar on Thursday March 22.
But what makes AV1 stand out technologically? In this posting, we will cover five key tools included in AV1, which have been adopted to help reduce bandwidth demands by up to 30% while still retaining or improving picture quality.

A royalty free solution to match increasing demands in streaming quality and speed

Perhaps AV1’s most important feature is not a technological one: It was designed from the very start to be completely royalty-free, in an effort to provide a truly open video codec capable of providing high quality video streaming at lower bitrates. With the availability of high resolution content constantly increasing and technologies like VR and 360° video on the rise, the need for a suitable, technologically advanced and open codec has become apparent among large-scale content providers. This desire is probably best documented by the fact that virtually all leading industry players and tech companies are contributing members of the Alliance for Open Media, the development foundation behind AV1.
The alliance has set out to finally provide an open standard for internet video streaming, following the path of other open standards like CSS or PNG, which are already shaping our daily digital reality. Bitmovin has been a trailblazer pushing AV1 to become the standard for years to come. Learn more about the development timeline that lead to formation of AV1.
To name an example, Netflix, a major provider and driver of innovation in the industry, has already stated that they expect to be an early adopter of AV1, in addition to their efforts of contributing to the royalty-free development community. Mozilla is another key supporter, providing a successful browser implementation of AV1 for Firefox Nightly (powered by Bitmovin). With practically all big names on board, AV1 seems poised to become the standard for a world of content, which relies on large resolution video, VR and AR applications.
For now, let’s take a closer look at the five key encoding and decoding techniques which make AV1 an interesting choice to use in video streaming.

Film grain synthesis

Film grain occurs commonly in photographic film, most noticeably in over-enlarged pictures, but can also be applied digitally for artistic effect. During digital video compression, film grain creates massive problems as it is hard to recognize as such for machines and the constant “noise” creates a lot of traffic in the bitstream. This leads to high bitrate requirements for transmitting very little information. Since the information is of little actual value for the perceived quality – after all the human brain tends to filter visual noise out to some extent – finding a way to not actually transfer the information with the bitstream, but rather re-apply it later, poses a desirable solution.
This idea forms the base for AV1’s film grain synthesis. The goal is to de-noise the initial content before encoding it and then re-adding the noise or grain effect before output during the decoding process. This way, the unnecessary information would not have to be transmitted at all and the overall load of data could be reduced substantially.

Figure 1: Film grain synthesis process (simplified)

The potential in bandwidth savings for content providers from using this technology is enormous. More so for very “noisy” content, which can commonly occur in old video footage that has been digitized or in videos, which use film grain for artistic reasons. Either way, this tool can be used to great effect and forms a key benefit in AV1’s list of features.

Constrained Directional Enhancement Filter

Filtering is an essential process in every video codec, as it drastically increases the perceived quality of the encoded video. It mostly occurs along the outlines of each of the blocks, which are used to divide each picture into smaller sub-units during the compression process. AV1 contains various sets of filters, most of which are derived from existing codecs. The Constrained Directional Enhancement Filter (CDEF) is quite possibly the most impactful addition to the range of filters. This filter basically merges two existing filters: a directional de-ringing filter as used in the Daala video codec and the constrained low pass filter (CLPF) from the Thor video codec. CLPF is applied to filter out artifacts which stem from quantization errors and have not been corrected through the preceding application of a de-blocking filter. The directional de-ringing filter works by recognizing edges within each block and identifying their orientation. It then conditionally applies a directional low-pass filter along those edges, resulting in a smoother picture and an increase in perceived quality.

Figure 2: Direction search in CDEF as presented in: Steinar Midtskogen & Jean-Marc Valin: THE AV1 CONSTRAINED DIRECTIONAL ENHANCEMENT FILTER (CDEF). See: https://arxiv.org/abs/1602.05975

CDEF merges the two filters and works by analyzing the contents of each block, smoothing out artifacts along edges and de-blocking the picture. The search for the filtering parameters (direction and variance) is applied on the decoder’s end, after the actual video has already been encoded. The filtering process is also performed by the encoder, in order to get the correct reference frames. Since the filtering operation can be run on the consumer’s hardware, required network bandwidth can be reduced and with it the traffic load.

Warped motion and global motion compensation

Predicting and compensating motions is an important principle in video compression, as it allows for the reduction of redundant information, which would otherwise be part of the bitstream and thus increase the amount of data being transmitted. As such, motion compensation works by recognizing and anticipating movement patterns within frames and blocks and in turn, reducing the relevant information for the coding process to the required minimum.
Warped motion compensation is a particularly interesting technique, as it anticipates movement patterns in three dimensions, predicting spatial movement trajectories within videos. Based on the calculated predictions, redundant information is identified and omitted in the coding process, resulting in a significant reduction to the required load of data.
Global motion compensation predicts motions for an entire frame (e. g. camera movement, zooming sequences etc.) and uses these analyses to limit the amount of information transmitted in the bitstream. Basically, information is condensed to statements like “move all blocks right” or “pan this block”, thus saving data.
Motion compensation algorithms have been used and theorized upon for a while, but only on a two-dimensional level. AV1 marks the first time that non-planar motion compensation has been implemented into a video codec. Due to the constant increase in processing power of consumer devices, this technique is now ready to see use in mass-market applications.
These techniques work extremely well for predicting large area movements, like background motion or camera movements. Additionally, they can handle consistent backgrounds and color schemes very effectively, which is one of the reasons why animated videos tend to deliver great encoding results, even with very high levels of compression.

Increased coding unit size (up to 128×128)

As video resolutions keep getting larger, an increase in block size is an effective way to scale the compression process along with high resolution contents. Each frame is partitioned up into individual coding units (or blocks), which are then processed individually during the coding procedure. Consequently, small resolutions like 1280×720 (720p) can be divided into blocks with an individual size of 64×64 quite easily, whereas the same block size yields less practicality for large resolutions like 7680×4320 (8k UHD).

Figure 3: relative sizes of common video resolutions (current and historic) [Source]

As 4K and 8K video content is about to become more widespread, the move towards larger coding units is a necessary step in achieving high quality compression. Bigger units mean less blocks per frame, a factor which is beneficial for the encoding of large resolution video, as it allows for a higher level of compression while retaining great perceived quality. It does so by allowing for a reduction in coding delay for large resolutions, as well as by lowering signaling rates per block. An increased block size also enables the use of bigger prediction and transform units, which again benefits the handling of large resolution content.

Non-binary arithmetic coding

This technique marks an interesting change from other current codecs like HEVC or AVC. For those, every symbol which is entered into the arithmetic coding engine has to be binary. With AV1, these symbols can also be non-binary, meaning that they can have up to eight possible values instead of just two. The symbols are then processed by the arithmetic coding engine, which produces a binary bitstream as output. Both ends, encoder and decoder, operate using probability calculations to estimate how many output bits will be created from a given symbol. Theoretically, any given input symbol could therefore produce multiple bits or even just a fraction of a bit.

Figure 4: Binary and non-binary coding schemes

Although non-binary coding renders the coding process more complex by combining multiple values into a single symbol, it is still less complex than if it were one bit per symbol. One major benefit lies in the possibility to process more symbols per clock cycle using this procedure. As clock cycles have to be performed serially, non-binary coding achieves improvements by allowing multiple symbols to be handled during each serial cycle.

Where is AV1 headed?

As the final stages of development are winding down, it seems not too far-fetched to assume that AV1 is going to have massive impact on the world of video streaming in the near future. User demand for high quality video streaming is already more than just tangible and the coming generation of high resolution mobile devices and VR-enabled gadgets are about to push their way into mainstream availability. Seeing new technologies emerge and pave their way into our everyday lives is a fascinating process and AV1 will likely shape up to be a major factor in structuring our digital realities going forward.
AV1 is the next generation video codec and is on track to deliver a 30% improvement over VP9 & HEVC – Learn about Bitmovin and AV1

More AV1 Resources:

The post Cool New Video Tools: Five Encoding Advancements Coming in AV1 appeared first on Bitmovin.

Christian Feldmann – Bitmovin

VVC: Open-GOP Resolution Switching

Table of Contents

Closed-GOP prediction structure

Open-GOP prediction structure

Coding performance

Coding performance across segment boundaries

Open-GOP resolution switching

IBC 2023

Related Links

VVC Video Codec – The Next Generation Codec

State of VVC Video Codec

Evolution or Revolution?

What’s new in VVC?

Introducing the VVC Video Player: BitvvDecPlayer

Conclusion

Video technology guides and articles

HEVC vs VP9: The Battle of the Video Codecs

VP9 vs HEVC: The encoding setup

Software

Test set

Encoding

Presets

Results

Coding performance

VP9 vs HEVC Complexity Levels

Files

Summary

References

More video technology guides and articles:

Did you know?

Best Video Codec: An Evaluation of AV1, AVC, HEVC and VP9

Introduction

The Dataset

The Evaluation

Conclusion

Video technology guides and articles

State of Compression: What is VVC and how does it work?

What is VVC?

VVC Coding structure

Slices Tiles and Subpictures

Block Partitioning

Block Prediction

Intra Prediction

Inter prediction

Transformation and Quantization

Other

Coding performance

Conclusion

Links and more information

Bitmovin & Standardization

Video technology guides and articles

Cool New Video Tools: Five Encoding Advancements Coming in AV1

A royalty free solution to match increasing demands in streaming quality and speed

Film grain synthesis

Constrained Directional Enhancement Filter

Warped motion and global motion compensation

Increased coding unit size (up to 128×128)

Non-binary arithmetic coding

Where is AV1 headed?

More AV1 Resources: