vvc – Bitmovin https://bitmovin.com Bitmovin provides adaptive streaming infrastructure for video publishers and integrators. Fastest cloud encoding and HTML5 Player. Play Video Anywhere. Tue, 22 Aug 2023 15:12:03 +0000 en-GB hourly 1 https://bitmovin.com/wp-content/uploads/2023/11/bitmovin_favicon.svg vvc – Bitmovin https://bitmovin.com 32 32 143rd MPEG Meeting Takeaways: Green metadata support added to VVC for improved energy efficiency https://bitmovin.com/blog/143nd-mpeg-meeting-takeaways/ https://bitmovin.com/blog/143nd-mpeg-meeting-takeaways/#respond Tue, 22 Aug 2023 15:11:18 +0000 https://bitmovin.com/?p=266312 Preface Bitmovin is a proud member and contributor to several organizations working to shape the future of video, including the Moving Pictures Expert Group (MPEG), where I along with a few senior developers at Bitmovin are active members. Personally, I have been a member and attendant of MPEG for 20+ years and have been documenting...

The post 143rd MPEG Meeting Takeaways: Green metadata support added to VVC for improved energy efficiency appeared first on Bitmovin.

]]>

Preface

Bitmovin is a proud member and contributor to several organizations working to shape the future of video, including the Moving Pictures Expert Group (MPEG), where I along with a few senior developers at Bitmovin are active members. Personally, I have been a member and attendant of MPEG for 20+ years and have been documenting the progress since early 2010. Today, we’re working hard to further improve the capabilities and energy efficiency of the industry’s newest standards, such as VVC, while maintaining and modernizing older codecs like HEVC and AVC to take advantage of advancements in neural network post-processing. 

The 143rd MPEG Meeting Highlights

The official press release of the 143rd MPEG meeting can be found here and comprises the following items:

  • MPEG finalizes the Carriage of Uncompressed Video and Images in ISOBMFF
  • MPEG reaches the First Milestone for two ISOBMFF Enhancements
  • MPEG ratifies Third Editions of VVC and VSEI
  • MPEG reaches the First Milestone of AVC (11th Edition) and HEVC Amendment
  • MPEG Genomic Coding extended to support Joint Structured Storage and Transport of Sequencing Data, Annotation Data, and Metadata
  • MPEG completes Reference Software and Conformance for Geometry-based Point Cloud Compression

In this report, I’d like to focus on ISOBMFF and video codecs and, as always, I will conclude with an update on MPEG-DASH.

ISOBMFF Enhancements

The ISO Base Media File Format (ISOBMFF) supports the carriage of a wide range of media data such as video, audio, point clouds, haptics, etc., which has now been further extended to uncompressed video and images.

ISO/IEC 23001-17 – Carriage of uncompressed video and images in ISOBMFF – specifies how uncompressed 2D image and video data is carried in files that comply with the ISOBMFF family of standards. This encompasses a range of data types, including monochromatic and colour data, transparency (alpha) information, and depth information. The standard enables the industry to effectively exchange uncompressed video and image data while utilizing all additional information provided by the ISOBMFF, such as timing, color space, and sample aspect ratio for interoperable interpretation and/or display of uncompressed video and image data.

ISO/IEC 14496-15, formerly known as MP4 file format (and based on ISOBMFF), provides the basis for “network abstraction layer (NAL) unit structured video coding formats” such as AVC, HEVC, and VVC. The current version is the 6th edition, which has been amended to support neural-network post-filter supplemental enhancement information (SEI) messages. This amendment defines the carriage of the neural-network post-filter characteristics (NNPFC) SEI messages and the neural-network post-filter activation (NNPFA) SEI messages to enable the delivery of (i) a base post-processing filter and (ii) a series of neural network updates synchronized with the input video pictures/frames.

Bitmovin has supported ISOBFF in our encoding pipeline and API from day 1 and will continue to do so. For more details and information about container file formats, check out this blog

Video Codec Enhancements

MPEG finalized the specifications of the third editions of the Versatile Video Coding (VVC, ISO/IEC 23090-3) and the Versatile Supplemental Enhancement Information (VSEI, ISO/IEC 23002-7) standards. Additionally, MPEG issued the Committee Draft (CD) text of the eleventh edition of the Advanced Video Coding (AVC, ISO/IEC 14496-10) standard and the Committee Draft Amendment (CDAM) text on top of the High Efficiency Video Coding standard (HEVC, ISO/IEC 23008-2).

These SEI messages include two systems-related SEI messages, (a) one for signaling of green metadata as specified in ISO/IEC 23001-11 and (b) the other for signaling of an alternative video decoding interface for immersive media as specified in ISO/IEC 23090-13. Furthermore, the neural network post-filter characteristics SEI message and the neural-network post-processing filter activation SEI message have been added to AVC, HEVC, and VVC.

The two SEI messages for describing and activating post-filters using neural network technology in video bitstreams could, for example, be used for reducing coding noise, spatial and temporal upsampling (i.e., super-resolution and frame interpolation), color improvement, or general denoising of the decoder output. The description of the neural network architecture itself is based on MPEG’s neural network representation standard (ISO/IEC 15938 17). As results from an exploration experiment have shown, neural network-based post-filters can deliver better results than conventional filtering methods. Processes for invoking these new post-filters have already been tested in a software framework and will be made available in an upcoming version of the VVC reference software (ISO/IEC 23090-16).

Bitmovin and our partner ATHENA research lab have been exploring several applications of neural networks to improve the quality of experience for video streaming services. You can read the summaries with links to full publications in this blog post.

The latest MPEG-DASH Update

The current status of MPEG-DASH is depicted in the figure below:

MPEG-DASH Status Updates from 143rd MPEG Meeting (07/23)

The latest edition of MPEG-DASH is the 5th edition (ISO/IEC 23009-1:2022) which is publicly/freely available here. There are currently three amendments under development:

  • ISO/IEC 23009-1:2022 Amendment 1: Preroll, nonlinear playback, and other extensions. This amendment has been ratified already and is currently being integrated into the 5th edition of part 1 of the MPEG-DASH specification.
  • ISO/IEC 23009-1:2022 Amendment 2: EDRAP streaming and other extensions. EDRAP stands for Extended Dependent Random Access Point and at this meeting the Draft Amendment (DAM) has been approved. EDRAP increases the coding efficiency for random access and has been adopted within VVC.
  • ISO/IEC 23009-1:2022 Amendment 3: Segment sequences for random access and switching. This amendment is at Committee Draft Amendment (CDAM) stage, the first milestone of the formal standardization process. This amendment aims at improving tune-in time for low latency streaming.

Additionally, MPEG Technologies under Consideration (TuC) comprises a few new work items, such as content selection and adaptation logic based on device orientation and signaling of haptics data within DASH.

Finally, part 9 of MPEG-DASH — redundant encoding and packaging for segmented live media (REAP) — has been promoted to Draft International Standard (DIS). It is expected to be finalized in the upcoming MPEG meetings.

Bitmovin recently announced its new Player Web X which was reimagined and built from the ground up with structured concurrency. You can read more about it and why structured concurrency matters in this recent blog series

The next meeting will be held in Hannover, Germany, from October 16-20, 2023. Further details can be found here.

Click here for more information about MPEG meetings and their developments.

Are you currently using the ISOBMFF or CMAF as a container format for fragmented MP4 files? Do you prefer hard-parted fMP4 or single-file MP4 with byte-range addressing? Vote in our poll and check out the Bitmovin Community to learn more. 

 Looking for more info on streaming formats and codecs? Here are some useful resources:

The post 143rd MPEG Meeting Takeaways: Green metadata support added to VVC for improved energy efficiency appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/143nd-mpeg-meeting-takeaways/feed/ 0
State of Compression: Testing h.266/VVC vs h.265/HEVC https://bitmovin.com/blog/vvc-quality-comparison-hevc/ Wed, 16 Dec 2020 13:44:37 +0000 https://bitmovin.com/?p=144919 VVC – the latest evolution for modern codecs Versatile Video Coding (h.266/VVC) is the newest block-based hybrid codec from the Joint Video Experts Team (JVET), a group comprised of MPEG and ISO/ITU members such as Bitmovin and Fraunhofer HHI, and promises to vastly improve the compression capabilities of workflows for any organization within the streaming...

The post State of Compression: Testing h.266/VVC vs h.265/HEVC appeared first on Bitmovin.

]]>
VVC – the latest evolution for modern codecs

BLOG POST_Testing_VVC Featured Image
Versatile Video Coding (h.266/VVC) is the newest block-based hybrid codec from the Joint Video Experts Team (JVET), a group comprised of MPEG and ISO/ITU members such as Bitmovin and Fraunhofer HHI, and promises to vastly improve the compression capabilities of workflows for any organization within the streaming industry, including but not limited to, OTT, VR, AR, and many other providers. As fellow members of MPEG, the Bitmovin encoding team was eager to test the capabilities of the newest codec and the potential improvements it offered over its predecessor h.265/HEVC. The ultimate goal of the project was to determine the performance parameters of the VVC codec and the subjective visual quality enhancements that ensue. While Fraunhofer HHI claimed that the VVC codec promises to improve visual quality and reduce bitrate expenditure by around 50% over HEVC, we wanted to prove the validity of the statement.

“Overall, H.266/VVC provides efficient transmission and storage of all video resolutions from SD to HD up to 4K and 8K, while supporting high dynamic range video and omnidirectional 360° video.” – Fraunhofer HHI

The end goal of our research was to implement the VVC distribution process into Bitmovin’s standard encoding process, as illustrated below:
VVC-encoding-workflow-illustrated

Developing a VVC Encoding Process

To kick-off our experiment, we added the VVC Test Model (VTM) encoder library into our encoder with a flexible interface. However, some critical changes were needed in the Bitmovin API to enable VVC encoding

Implementing the VVC codec

The first step to enable VVC encoding is to add VVC as a new video codec to our API. Since we haven’t established a “real” encoder yet, there are limited settings, parameters, and inputs that we can use. For the first set of tests we used the following: width, height, bitrate, pixelFormat, rate, and Constant Rate Factor (CRF); as seen below:
VVC-Encoding Config-Code Snippet
There are additional input values necessary for a real VVC encoding but didn’t apply for our test. These are colorConfig, sampleAspectRatio, encodingMode, preset, and profile. Any other specialized settings will depend on the encoder implementation that we use when live. At the time of our test, VVenC and other encoder implementations, like x266 were not yet available and thus the settings are unknown. Now that the VVC codec parameters have been established we needed to actually implement them into the Bitmovin encoder.

Testing VVC in Bitmovin’s Encoder

Thanks to our flexible interface, Lead Encoding Engineer, Christian Feldmann, had already added the library interface to the open-sourced reference encoder (VTM) and also added a patch to our internal encoder, which pushes frames to the VTM encoder and retrieves any data from it. Then, we implemented a new Bitmovin encoder Dependency that contains VTM with h.266 support.
VVC_VTM implementation-illustrated
Lastly, we created a VideoService integration test that would prove that the VTM integration works. However, as expected of a brand new codec implementation – most of the initial tests failed.

Trial, error, and repeat: VVC encoding initial failures

Although expected, we were disappointed to find that running multiple renditions using the h.266 codec doesn’t work as the VTM implementation uses global variables. Thus we were not able to launch multiple encoding tasks on the same “machine” (respectively within the same docker container). Another issue that we encountered is that only the 1080p output format worked as a result of the encode. While promising, the VVC codec and encoding are intended to work for nearly any output format, and a single output is clearly not the intended result. To resolve these issues we implemented a few minor patches within our API.

Resolving the VVC encoding failures

Fortunately for our experiment, a simple patch to the encoder, as well as the implementation of the libVTM worked almost out-of-the-box. The remaining issues were two minor bugs that required the connection of the CRF value of our API to the baseQP value in libVTM. 
Now that the encoding issues were resolved, we were able to run the first objective visual quality evaluations using the Peak Signal-to-Noise Ratio (PSNR). As a baseline, we ran four different CRF values at 1080p resolution to calculate BD-rate curves in comparison to h.265/HEVC. The result was h266 and h265 output files with the same four CRFs that allowed the first quality comparison.

How did VVC stack up?

Once our initial implementation tests were complete, we set up a complete end-to-end encoding test with a single asset, Mango’s open-source project Tears of Steel


Side by side comparison video of encoded asset VVC vs HEVC
Given the limitations of the encoder implementations, VVC’s performance was as promised (and expected), offering ~45% percent lower bitrate compared to HEVC. As visualized in the graph below, our results are nearly identical to that of the official measurements provided by JVET.
Although there are significant visual quality improvements and at lower bitrate expenditure, the current encoding results came at an immense computing and time cost. In our initial tests, the encoding time for a 4-second video ranged between 2.5-6.4 hrs (varied based on output parameters).
vvc-vs-h265 visual comparison_side-by-side

What does this mean moving forward?

Our initial tests of the VVC codec were run with the bare bones VTM encoder implementation without any adjustments, optimizations, or improvements. As the industry continues to develop the back-end tech to support h.266/VVC, we can expect that encoding times and computing power will decrease significantly in the very near future. Since the Bitmovin encoding team ran this first test, our academic and research arms ran additional tests at both the Alpen Adria University in Klagenfurt, as well as at the 132nd MPEG meeting. The next set of verification tests extended VVC’s capabilities to successfully encode ultra-high definition (UHD) content with standard dynamic range so that it can be used in newer streaming and broadcast television applications. The latest tests were run using the recently released open-source encoder implementation of VVC (VVenC), which displayed an additional 10% bitrate savings over the original VTM implementation. The Bitmovin encoding team plans to continue testing the VVC codec and adding new encoder implementations into our workflows – with expectations of officially launching VVC support sometime in the near future.

Video technology guides and articles

The post State of Compression: Testing h.266/VVC vs h.265/HEVC appeared first on Bitmovin.

]]>
State of Compression: What is VVC and how does it work? https://bitmovin.com/blog/compression-standards-vvc-2020/ Fri, 14 Feb 2020 17:45:49 +0000 https://bitmovin.com/?p=96897 What is VVC? Versatile Video Coding (VVC) is the most recent international video coding standard which was finalized in July of 2020. It is the successor to High-Efficiency Video Coding (HEVC) as it was also developed jointly by the ITU-T and ISO/IEC. So what is really new in VVC? Is this a real revolution when...

The post State of Compression: What is VVC and how does it work? appeared first on Bitmovin.

]]>
What is VVC?

Versatile Video Coding (VVC) is the most recent international video coding standard which was finalized in July of 2020. It is the successor to High-Efficiency Video Coding (HEVC) as it was also developed jointly by the ITU-T and ISO/IEC.
So what is really new in VVC? Is this a real revolution when it comes to video coding? In short: No. While it is technically highly advanced, it is only an evolutionary step forward from HEVC. It still uses the block-based hybrid video coding approach, an underlying concept of all major video coding standards since h.261 (from 1988). In this concept, each frame of a video is split into blocks and all blocks are then processed in sequence. 
The decoder processes every block in a loop, which starts with entropy decoding of the bitstream. The decoded transform coefficients are then put through an inverse quantization and an inverse transform operation. The output, which is an error signal in the pixel domain, then enters the coding loop and is added to a prediction signal. There are two prediction types. Inter Prediction, which copies blocks from previously coded pictures (motion compensation), and Intra Prediction, which only uses decoded pixel information from the picture being decoded. The output of the addition is the reconstructed block that is put through some filters. This usually includes a filter to remove blocking artifacts that occur at the boundaries of blocks, but also more advanced filters can be used. Finally, the block is saved to a picture buffer so it can be output on a screen once decoding is done and the loop can continue with the next block.
At the encoder side, the situation is a little more complex as the encoder has to perform the corresponding forward operations, as well as the inverse operations from the decoder to obtain identical information for prediction.

HybridVideoDecoder-VVC-Illustrated
The generalized block diagram of a hybrid video decoder.

Although VVC also uses these basic concepts, all components have been improved and/or modified with new ideas and techniques. In this blog post, I will show some of the improvements that VVC yields. However, this is only a small selection of new tools in VVC as a full list of all details and tools could easily fill a whole book (and someone else probably already started writing one).

VVC Coding structure

Slices Tiles and Subpictures

As mentioned above, each frame in the video is split into a regular grid of blocks. In VVC the size of these so-called Coding Tree Units (CTU) was increased from 64×64 in HEVC to 128×128 pixels. Multiple blocks can be arranged into logical areas. These are defined as Tiles, Slices, and Subpictures. Although these techniques are already known from earlier codecs, the way they are combined is new.

TilesAndSlices-VVC-illustrated
The picture is split into four tiles of equal size (blue). There are four slices (green). The one on the left contains two tiles. On the top right, the tile is split into two slices. CTUs are marked in grey.

The key feature of these regions is that they are also logically separated in the bitstream and enable various use-cases:

  • Since each region is independent, both the encoder and the decoder can implement parallel processing.
  • A decoder could choose to only partially decode the regions of the video that it needs. One possible application is the transmission of 360 videos where a user is only able to see parts of a full video.
  • A bitstream could be designed to allow the extraction of a cropped part of the video stream on the fly without re-encoding. [JVET-Q2002]

Block Partitioning

Let’s go back to 128×128 CTU blocks. As I mentioned before, the coding loop is traversed for each block. However, processing only full 128×128 pixel blocks would be very inefficient, so each CTU is flexibly split into smaller sub-blocks and the information on how to split it is encoded into the bitstream. The encoder can choose the best division of the CTU based on the content of the block. In a rather uniform area, bigger blocks are more efficient. Whereas in areas with edges or more detail, smaller blocks are typically chosen. The partitioning in VVC is performed using two subsequent hierarchical trees:

  • Quaternary tree: There are two options for each block. Do not split the block further or split it into four square sub-blocks of half the width and half the height. For each sub-block, the same decision is made again in a recursive manner. If a block is not split further, the second three is applied.
  • Multi-type tree: In the second tree, there are multiple options for each block. It can be split in half using a single vertical or horizontal split. Alternatively, it can be split vertically or horizontally into three parts (ternary split). As for the first tree, this one is also recursive and each subblock can be split using the same four options again. The leaf nodes of this tree that are not split any further are called Coding Units (CUs) and these are processed in the coding loop.

BlockPartioning-VVC-Illustrated
Each block is split into two stages. First using a hierarchical binary tree (left) and secondly using a hierarchical ternary tree (right).

The factor that distinguishes VVC from other video codecs is the high flexibility of block sizes and shapes that a CTU can be split into. With this, an encoder can flexibly adapt to a wide range of video characteristics that result in better coding performance. Of course, this high flexibility comes at a cost. The encoder must consider all possible splitting options which require more computation time. [JVET-Q2002]

Block Prediction

Intra Prediction

In intra prediction, the current block is predicted from already decoded parts of the current picture. To be more precise, only a one-pixel wide strip from the neighborhood is used for normal intra prediction. There are multiple modes on how to predict a block from these reference pixels. Well-known modes that are also present in VVC are Planar and DC prediction as well as Angular Prediction. While the number of discrete directions for the angle was increased from 33 to 65 in VVC, not much else changed compared to HEVC. So, let’s concentrate on tools that are actually new:

  • Wide Angle Intra Prediction: Since prediction blocks in VVC can be non-square, the angels of certain directional predictions are shifted so that more reference pixels can be used for prediction. Effectively this extends the directional prediction angles to values beyond the normal 45° and below -135°. [JVET-P0111]
  • Cross-component Prediction: In many cases (e.g. when there is an edge in the block) the luma and chroma components carry very similar information. In cross-component prediction, this is exploited by direct prediction of the chroma components from the reconstructed luma block using a linear combination of the reconstructed pixels with two parameters: a factor and an offset where the factors are calculated from the intra reference pixels. If necessary, scaling of the block is performed as well. [JVET-Q2002]

Multi Reference Line Prediction: As mentioned before, only one row of neighboring pixels is used for intra prediction. In VVC, this restriction is relaxed a bit so that prediction can be performed from two lines that are not directly next to the current block. However, there are several restrictions to this as only one line can be used at a time and no prediction across CTU boundaries is allowed. These limitations are necessary for efficient hardware implementations. [JVET-L0283]

MultiReferenceLineIntraPrediction-VVC-illustrated
In traditional intra prediction, only one line (line 0) is used for prediction of the current block. In Multi Reference Line Prediction this constraint is relaxed the lines 1 or 3 can be used for prediction as well.

Of course, this list is not complete and there are several more intra prediction schemes which further increase the coding efficiency. The method of intra mode prediction and coding of the mode was improved and refined as well.

Inter prediction

For inter prediction, the basic tools from HEVC were carried over and adapted. For example, the basic concepts of uni- and bi-directional motion compensation from one or two reference pictures are mostly unchanged. However, there are some new tools that haven’t been used like this in a video coding standard before:
Bi-directional optical flow (BDOF): If a prediction block uses bi-prediction with one of the references in the temporal past and the second one in the temporal future, BDOF can be used to refine the motion field of the prediction block. For this, the prediction block is split into a grid of 4×4 pixel sub-blocks. For each of these 4×4 blocks, the motion vector is then refined by calculating the optical flow using the two references. While this adds some complexity to the decoder for the optical flow calculation, the refined motion vector field does not need to be transmitted and thus the bitrate is reduced. [JVET-J0024]
Decoder side motion vector refinement: Another method that allows for the motion vectors to automatically be refined at the decoder without the transmission of additional motion data is to perform an actual motion search at the decoder side. While this basic idea has been around for a while, the complexity of a search at the decoder side was always considered too high until now. The process works in three steps:

  • First, a normal bi-prediction is performed, and the two prediction signals are weighted into a preliminary prediction block.
  • Using this preliminary block, a search around the position of the original block in each reference frame is performed. However, this is not a full search as an encoder would perform it, but a very limited search with a fixed number of positions.
  • If a better position is found, the original motion vector is updated accordingly. Lastly, bi-prediction with the updated motion vectors is performed again to obtain the final prediction. [JVET-J1029]

Geometric Partitioning: In the section about block partitioning it was shown how each CTU can be split into smaller blocks. All of these splitting operations only split rectangular blocks into smaller rectangular blocks. Unfortunately, natural video content typically contains more curved edges that can only poorly be approximated using rectangular blocks. In this case, Geometric Partitioning allows the non-horizontal splitting of a block into two parts. For each of the two parts, motion compensation using independent motion vectors is performed and the two prediction signals are merged together using a blending at the edge. 

GeometricPartitioning-VVC-examples
Some example splits using geometric partitioning.

In the current implementation, there are 82 different geometric partition modes. They are made up of 24 slopes and 4 offset values for the partition line. However, the exact number of modes is still under discussion and may still change. [JVET-P0884, JVET-P0085]
Affine motion: Conventional motion compensation using one motion vector can only represent two-dimensional planar motion. This means that any block can be moved on the image plane in x and y directions only. However, in a natural video, strictly planar motion is quite rare and things tend to move more freely (e.g. rotate and scale). VVC implements an affine motion model that uses two or three motion vectors to enable motion with four or six degrees of freedom for a block. In order to keep the implementational complexity low, the reference block is not transformed on a pixel basis, but a trick is applied to reuse existing motion compensation and interpolation methods. The prediction block is split into a grid of 4×4 pixel blocks. From the two (or three) control point motion vectors, one motion vector is calculated for each 4×4 pixel block. Then, conventional two-dimensional planar motion compensation is performed for each of these 4×4 blocks. While this implementation is not a truly affine motion compensation it is a good approximation and allows for very efficient implementation in hard- and software. [JVET-O0070]
4x4Sub-blocks-VVC-Illustrated
For every 4×4 subblock, an individual motion vector (green) is calculated from the control point motion vectors (blue). Then, conventional motion compensation is performed per 4×4 block.

Transformation and Quantization

The transformation stage went through some major refactoring as well. Rectangular blocks that were introduced by the ternary split are now supported by the transformation stage by performing the transform for each direction separately. The maximum transform block size was also increased to 64×64 pixels. These bigger transform sizes are particularly useful when it comes to HD and Ultra-HD content. Furthermore, two additional types of transform were added. While the Discrete Cosine Transform in variant 2 (DCT-II) is already well known from HEVC, one further variant of the DCT (the DCT-VIII) was added, as well as one Discrete Sine Transform (DST-VII). Depending on the prediction mode, an encoder can choose different transforms depending on which one works best.
The biggest change to the Quantization stage is the increase in the maximum Quantization Parameter (QP) from 51 to 63. This was necessary as it was discovered that even at the highest possible QP setting, the coding tools of VVC worked so efficiently that it was not possible to reduce the bitrate and quality of certain encodes to the needed levels.
One more really interesting new tool is called Dependent Quantization. The purpose of the quantization stage is to map the output values from the transformation, which are continuous, onto discrete values that can be coded into the bitstream. This operation inherently comes with a loss of information. The coarser the quantization is (the higher the QP value is), the more information is lost. In the figure below, a simple quantization scheme is shown where all values between each pair of lines are quantized to the value of the marked blue cross. Only the index of the blue cross is then encoded into the bitstream and the decoder can reconstruct the corresponding value.

SimpleQuantizationScheme-VVC
Basic quantization. Each vertical line marks a decision threshold. All values between the two thresholds are quantized to one reconstruction value. The reconstruction values are marked with blue crosses.

Typically, only one fixed quantization scheme is used in a video codec. In Dependent Quantization, two of these quantization schemes are defined with slightly shifted reconstruction values.
EmbeddedQuantizationCut2-VVC
In Embedded quantization two sets of reconstruction values are used. The decoder automatically switches between these based on the previously decoded values.

Switching between the two quantizers happens implicitly using a tiny state machine that uses the parity of the already coded coefficients. The encoder can then switch between the quantizers by deliberately changing some of the reconstruction values. Finding the optimal place for this switch where the introduced error is lowest, and the switch gives the most gain can be performed using a rate-distortion trade-off. In some manner, this is related to Sign Data Hiding (used in HEVC) where also information is “hidden” in other data. [JVET-K0070]

Other

All tools discussed so far were built and optimized for the coding of conventional natural two-dimensional video. However, the word `versatile` in its name indicates that VVC is meant for a wide variety of applications. And indeed VVC includes some features for more specific tasks which make it very versatile. Former codecs typically put these specialized tools into separate standards or separate extensions. One such tool is the Horizontal Wrap Around Motion Compensation. A widespread method of transmission of 360° content is to map the 360° video onto a 2D plane using equi-rectangular projection. The 2D video can then be encoded using conventional 2D video coding. However, the video has some special properties which can be used by the encoder. One property is that there is no left or right border in the video. Since the 360° view wraps around, this can be used for motion compensation. So when motion compensation from outside of the left boundary is performed, the prediction wraps around and uses pixel values from the right side of the picture.

360VideoMotionCompensation-VVC
Prediction from outside of the left side of the issue will wrap around and use pixels from the right side of the picture as well.

While this tool increases the compression performance it also helps to improve the visual quality since normal video codecs tend to produce a visible edge at the line where the left and right side of the 2D video are stitched back together. [JVET-L0231]
Another application of video coding is the coding of computer-generated video content, also referred to as screen content. This type of content usually has some special characteristics like very sharp edges and very homogeneous areas which are atypical for natural video content. One very powerful tool in this situation is Intra Block Copy which performs a copy operation from the already decoded area of the same frame. This is very similar to motion compensation with the key difference that the signalled vector does not refer to a temporal motion but just points to the source area in the current frame for the copy operation. [JVET-J0042]

Coding performance

With every Standardization meeting, the VVC test model software (VTM) is updated and a test is run to compare the latest version of VTM to the HEVC reference software (HM). This test is purely objective using PSNR values and the Bjøntegaard delta. While multiple different configurations are tested, we will focus on the so-called Random-Access configuration which is the most relevant when it comes to video transmission and streaming. 
- Bitmovin

BD-rate comparison of VTM 7.0 compared to HM 16.20.  [Q0003]

In terms of BD-rate performance, VTM is able to achieve similar PSNR values while reducing the required bandwidth by roughly 35%. While the encoding time is not a perfect measure of complexity it can give a good first indication. The complexity of VVC at the encoder side is roughly 10 times higher, while the decoder complexity only increases by a factor of 1.7. Please note that these results are all based on PSNR results. It is well known that PSNR values are not that well coupled to the actual subjectively perceived quality and some preliminary experiments show that the subjective results seem to be higher than 35% bitrate reduction. A formal subjective test is planned for later this year.

Conclusion

So after all of this technical detail what is the future of VVC going to be? From a technical side, VVC is the most efficient and advanced coding standard that money can buy. However, it is unknown as of yet how much it will really cost. Once the standardization process is officially finished in October 2020, the process to establish licensing terms for the new standard can be started. From previous standards, we have learned that this is a complicated process that can take a while. At the same time, there are other highly efficient codecs out there for which applications and implementations are maturing and evolving. 

Links and more information

The JVET standardization activity is a very open and transparent one. All input documents to the standardization are publicly available here. Also, the reference encoder and decoder software are publicly available here.

Bitmovin & Standardization

Bitmovin is heavily involved in the standardization process around back-end vidtech; this includes our attendance and participation in the quarterly MPEG meetings, as well our membership and involvement in AOMedia.

Video technology guides and articles

 

The post State of Compression: What is VVC and how does it work? appeared first on Bitmovin.

]]>