VR – Bitmovin

Multiview HEVC (MV-HEVC): Powering spatial video experiences and more

Andy Francis — Mon, 02 Dec 2024 01:24:58 +0000

The world of video technology is constantly evolving, and one of the more interesting developments in recent years is the story of MV-HEVC (Multiview High Efficiency Video Coding). Even though it was added to the HEVC specification in 2014, MV-HEVC didn’t see much commercial use for almost a decade.

That changed when Apple launched the Apple Vision Pro, announcing that unlike Meta Quest and other headsets, their new device would take advantage of MV-HEVC for immersive video experiences. In this blog post, we’ll explore what MV-HEVC is, its potential for enhancing streaming experiences and how to get started.

Table of Contents

What is MV-HEVC?
How MV-HEVC works
Applications of MV-HEVC
Stereoscopic Video (3D Video)
Spatial Video
Multiview Video
Dolby Vision with MV-HEVC
Apple Vision Pro and beyond
MV-HEVC video tools
Direct recording with Apple Vision Pro and iPhone
Apple AVFoundation support
Bitmovin VOD encoding beta
Conclusion

What is MV-HEVC?

MV-HEVC stands for Multiview High Efficiency Video Coding, an extension of HEVC that was added to the second edition of the standard in 2014. It’s designed to support the efficient encoding of multiview video content captured from multiple viewpoints, often to create stereoscopic (3D) effects or spatial video experiences for virtual reality (VR) and augmented reality (AR).

Doubling the encoding and bandwidth requirements for multiple viewpoints could potentially create buffering and playback issues, but MV-HEVC enables the efficient compression and storage of stereoscopic content, reducing the bandwidth required for streaming or the file size needed for storage without compromising the video’s quality.

In short, MV-HEVC allows the encoding of multiple views of the same scene in a way that preserves video quality while keeping the bitrates manageable. This makes it a good fit for 3D, AR and VR applications that require a lot of real-time data processing.

How MV-HEVC works

Before getting into how MV-HEVC works, let’s take a quick step back to the basics of video encoding. Temporal compression is a technique for reducing file size that is common to all major video codecs. Unless there is a scene change, individual frames of video are usually not that different from one frame to the next. Temporal compression exploits that fact and reuses data where it can, saving some bits from being encoded and shrinking the file size.

This is done by encoding different types of frames that require less data to reconstruct for playback. I-frames are fully encoded frames that serve as anchor points, while P-frames (Predictive frames) can reuse data from frames that came before them. B-frames (Bi-direcional predictive frames) can reuse data from frames both before and after them. If you’re interested in learning more about some of the fundamentals of video encoding, check out this guide.

I touched on all of that because a key benefit of MV-HEVC is that it is also able to take advantage of the commonalities across multiple camera angles or views. In the cases of immersive and 3D videos that are created with different views for the right and left eye, the similar viewpoints usually mean there’s a lot of potential for compression, creating smaller, more manageable files for streaming and storage.

Example multiview prediction structure, with cross references between views – Image source: Fraunhofer HHI

Applications of MV-HEVC

Stereoscopic Video (3D Video)

MV-HEVC is particularly useful in the realm of 3D video or stereoscopic content, where two slightly different views (one for each eye) create the stereoscopic effect. By encoding both the left eye and right eye views efficiently in a single stream, MV-HEVC reduces the file size and bitrate compared to other methods. This is crucial for streaming applications like 3D movies or immersive VR experiences where quality and efficiency are key. Other codecs can be used for 3D stereoscopic video as we cover in this blog, but MV-HEVC is more efficient.

Top-Bottom Stereoscopic Format source: Blender Foundation

Spatial Video

Another application of MV-HEVC is in spatial video, which is typically used for virtual reality (VR) or augmented reality (AR) content. The Apple Vision Pro is built around the idea of capturing and presenting spatial video, allowing users to immerse themselves in a three-dimensional representation of a scene, combining video and depth information. MV-HEVC support is essential for these types of experiences, reducing massive bitrates of the raw files into something manageable for streaming and real-time immersive experiences.

Side-by-side lenses on the iPhone 15 Pro and iPhone 16 allow for native capturing and recording of MV-HEVC spatial video

Multiview Video

MV-HEVC is also important for multiview video, where multiple views of the same scene are captured from different angles. This could be used in sports broadcasts, where different camera angles are encoded into a single video stream, or for applications that allow users to choose their viewing angle interactively. Depending on your exact use case, this may require multiple decoders or extra processing power that might not be available on all platforms.

Example multiview player, now supported by Bitmovin on some platforms

Dolby Vision with MV-HEVC

MV-HEVC is now also compatible with Dolby Vision, a popular High Dynamic Range (HDR) video format that helps ensure content looks as realistic and as true to the creator’s vision as possible. Most of the top-tier premium streaming content these days is being made available in Dolby Vision format, so it makes sense that companies investing in MV-HEVC production pipelines would want to take advantage of Dolby Vision. Dolby Vision Profile 20 extends the potential quality enhancements of Dolby Vision to MV-HEVC and immersive content.

Apple Vision Pro and beyond

The Apple Vision Pro is pushing the boundaries of immersive media and while they didn’t create the VR headset segment, Apple definitely put their stamp on it. There are several examples over the years of Apple’s influence on the media technology industry, from their decision to not support Flash video to their decision to (finally) support AV1.

It seems only likely there will be a halo effect for MV-HEVC around the Vision Pro. One early example is the Blackmagic URSA Cine Immersive camera. I expect in 2025 we’ll see more companies venturing into MV-HEVC support from capture to post-production to distribution.

MV-HEVC video tools

Direct recording with Apple Vision Pro and iPhone

You can record spatial video using MV-HEVC directly on the Apple Vision Pro, iPhone 15 Pro and all iPhone 16 models. The distance between the 2 camera lenses on the Vision Pro seems to provide better results with more depth compared to spatial videos captured on iPhone.

Apple AVFoundation support

Apple also added support to their AVFoundation APIs for converting side-by-side 3D videos into MV-HEVC and spatial videos. You can find more information in their developer documentation here.

Bitmovin VOD encoding beta

Bitmovin’s VOD Encoding now supports MV-HEVC as part of a private beta. If you’re interested in adding MV-HEVC to your transcoding workflows, we’d love to discuss the details with you. You can reply in the Bitmovin Community, comment on this post or get in touch with your Bitmovin contact for more info.

Conclusion

Thanks in large part to Apple, MV-HEVC is poised to become a key technology in the future of immersive and multiview content. Its ability to efficiently encode multiple views of the same scene, reduce the data required, and maintain high video quality makes it an essential tool for everything from stereoscopic 3D movies to virtual reality experiences on devices like the Apple Vision Pro.

On their other platforms, Apple seems to have signalled a shift toward using the AV1 codec, but AV1 does not currently have multiview support. It will be interesting to see how that situation evolves both within Apple’s products and the wider video ecosystem. While the only certainty is that things will change, unless Apple abandons the Vision Pro, MV-HEVC is likely to be part of the picture for the foreseeable future.

The post Multiview HEVC (MV-HEVC): Powering spatial video experiences and more appeared first on Bitmovin.

Encoding VR and 360 Immersive Video for Meta Quest Headsets

Gabriel Dávila Revelo — Tue, 14 Nov 2023 07:24:23 +0000

This article was originally published in April 2023. It was updated Nov 14, 2023 with information about Quest 3 AV1 support.

Whether you’re calling it Virtual Reality (VR) or 360 video or Metaverse content, there are a lot of details that should be taken into consideration in order to guarantee a good immersive experience. Things like video resolution, bitrates and codec settings all need to be set in a way that creates a high quality of experience for the viewers, while being conscious of storage and delivery costs that can come with these huge files. Although all this stuff has been widely discussed for 2D displays, like mobile phones and TVs, VR streaming differs enormously from those traditional screens, using different display technology that drastically shortens the viewing distance from eye to screen. In addition to that, VR headset specs may differ from one device to another, so the same video may produce a different visual experience depending on the model or device. In this post we are going to share the things you need to consider, along with tips and best practices for how to encode great looking VR content, specifically for playback on Meta Quest (formerly known as Oculus) headsets.

Visual quality requirements of 3D-VR vs 2D videos
VMAF for 3D-VR
PSNR for 3D-VR
The Best Encoding Settings for Meta Quest devices
Resolution
H265 Video Codec Settings
Building 360-VR encoding workflows with Bitmovin VOD Encoding
Per-Title Encoding configuration for VR
Creating monoscopic outputs from stereoscopic inputs
AV1 Codec Support on Meta Quest 3

Visual quality requirements of 3D-VR vs 2D videos

Unlike traditional 2D screens, where viewers are located at a considerable distance from the screen, VR viewers are looking at a smaller screen much closer to the eyes. This drastically changes the way a video should be encoded in order to guarantee good visual quality for an immersive 3D experience. For this same reason, the traditional 2D video quality metrics such as VMAF and PSNR are not usually useful to measure the visual perception for 3-D VR content, for instance:

VMAF for 3D-VR

VMAF considers 2D viewers located at a viewing distance in the order of magnitude of the screen size, for example:

4K VMAF model – vmaf_4k_v0.6.1, takes into consideration that the viewer is located at 1.5H from the screen, where H is the TV screen high.

HD VMAF model – vmaf_v0.6.1, considers a viewer located at 3H from the screen.

The previous models resulted in a pixel density of about 60 pixels per degree (ppd) and 75 ppd – for 4K and HD respectively. However, when talking about VR videos, the pixel density is highly magnified, for instance, for Meta Quest 2 headsets the specs mention a pixel density of 20 ppd. Therefore, the predefined VMAF models are not suitable. Actually, if you do use VMAF to get the visual quality (VQ) for a VR video intended for headset playback, you’ll probably find it does not look good enough even though it has a high VMAF score – this is because of the “zoom in” that Quest does in comparison to the traditional screens.

PSNR for 3D-VR

Even when it is not a rule, it is expected to have good VQ on 2D videos when PSNR values are between 39 dB and 42 dB – for average to high complexity videos. See [1] [2] However, this PSNR range is usually not enough to create a good immersive experience with Quest headsets. For instance, according to some empirical tests we did, we found that at least a PSNR above 48 dB is required for good VQ with Quest devices.

image source: Meta Quest Blog

The Best Encoding Settings for Meta Quest devices

A general overview of the Video Requirements can be found at the Meta Quest website. Additionally, the following encoding settings may be useful when building your encoding workflow:

Resolution

The minimal resolution suggested by Meta is 3840 x 3840 px for stereoscopic content and 3840 x 1920 px for monoscopic content, which is much higher than earlier generations or mobile devices.

H265 Video Codec Settings

Video Codec – Meta Quest devices support H264(AVC) and H265(HEVC) codecs, however given that they require resolutions above 3840 px, we strongly recommend H265 due to the high encoding efficiency it has when comparing it to H264.

GOP Length – In our tests we successfully achieved a good VQ within the recommending bitrate range, using a 2-second GOP length for 30 fps content. However, since the VR experience is not as latency sensitive for video on demand, we suggest using greater GOP lengths in order to improve the encoding efficiency even more if needed.

Target bitrate and CRF – Meta suggests a target bitrate between 25-60 Mbps and as mentioned, we strongly suggest using the H265 codec to maintain high visual quality within that range. If the bitrate goes too far above the suggested maximum, customers may experience slow playback or stalling due to device performance issues.

Having said all that, it is worth mentioning that setting a proper bitrate to meet the VQ expectations is really challenging, mainly because the bitrates necessary may change from one piece of content to another depending on their visual complexity. Because of that, we suggest using a CRF based encoding instead of a fixed bitrate. Specifically, we found that when talking about H265, a CRF between 17-18 would produce videos that are suitable for viewing on Quest headsets without excessively high bitrates.

Building 360-VR encoding workflows with Bitmovin VOD Encoding

Bitmovin’s VOD Encoding provides a set of highly flexible APIs for creating workflows that fully meet Meta Quest encoding requirements. For instance:

If adaptive bitrate streaming is required at the output, Bitmovin Per-Title encoding can be used to automatically create the ABR ladder with the top rendition driven by the desired CRF target.
If progressive file output is required, a traditional CRF encoding can be used by capping the bitrates properly.
Additionally, Bitmovin filters can be used to create monoscopic content based on a stereoscopic input, for instance, cropping the original/stereoscopic video to convert it from a top-and-bottom or side-by-side array into a single one. Monoscopic outputs can be viewed on 2D displays, extending the reach of your 360 content beyond headsets.

Per-Title Encoding configuration for VR

The following per-title configuration may be used as a reference for encoding a VR content. Depending on the content complexity, the output may include from 4 to 7 renditions with the top rendition targeting a CRF value of 17.

perTitle: {
     h265Configuration: {
       minBitrate: 5000000,
       maxBitrate: 60000000,
       targetQualityCrf: 17,
       minBitrateStepSize: 1.5,
       maxBitrateStepSize: 2,
       codecMinBitrateFactor: 0.6,
       codecMaxBitrateFactor: 1.4,
       codecBufsizeFactor: 2,
       autoRepresentations: {
         adoptConfigurationThreshold: 0,
         },
       },
     }

Theres also full code samples here if you would like to dig deeper.

The same configuration can be used to encode any VR format such as top-and-bottom, side-by-side or monoscopic 360 content. The per-title algorithm will automatically propose a proper bitrate and resolution for each VR format based on the input details. Additionally, it is strongly recommended to use VOD_HIGH_QUALITY as an encoding preset and THREE_PASS as encoding mode. This will assure the Bitmovin Encoder delivers the best possible visual quality.

In our tests using typical medium-high complexity content, we found that using a CRF of 17 produces good VQ for Meta Quest playback, with PSNR values above 48 dB and bitrates that are usually below the suggested maximum of 60 Mbps.

Alternatively, traditional CRF encoding can be used instead of Per-title, for instance if only one rendition is desired at the output – with no ABR.

Creating monoscopic outputs from stereoscopic inputs

Usually, VR 360 cameras record the content in stereoscopic format either in top-and-bottom or side-by-side arrangements. However, depending on the customer use case, it could be required to convert the content from stereoscopic to monoscopic formats. This can be easily solved with the Bitmovin VOD Encoding API by applying cropping filters to remove the required pixels or frame percentage from the stereoscopic content, turning it into monoscopic format, i. e., by removing the left/right or the bottom/top side from the input asset.

Top-Bottom Stereoscopic Format source: Blender Foundation

For instance, the following javascript snippet would remove the top side of a 3840 x 3840 stereoscopic content:

.....
.....
// Crop filter definition
const cropTopSideFilter = new CropFilter({
  name: "stereo-to-mono-filter-example",
  left: 0,
  right: 0,
  bottom: 0,
  top: 1920,
 })
 
// Crop filter creation 
cropTopSideFilter = await bitmovinApi.encoding.filters.crop.create( cropTopSideFilter)

// Stream Filter definition
const cropTopSideStreamFilter = new StreamFilter({
  id : cropTopSideFilter.id,
  position: 0,
})

// StreamFilter creation
bitmovinApi.encoding.encodings.streams.filters.create(, , [cropTopSideStreamFilter] )

AV1 Codec Support on Meta Quest 3

In the recommended settings above, we strongly suggested using HEVC over H.264 because the newer generation codec offers greater compression efficiency that turns into bandwidth savings and a better quality of experience for users. Now with the Quest 3, you can take advantage of AV1, an even newer codec that outperforms HEVC. On average, our testing has shown that you can maintain equivalent quality while using around 30% lower bitrate with AV1. This will depend on the type of content you’re working with, so if you’re experimenting with AV1 for the Quest 3, choosing a bitrate that’s ~25% lower than your HEVC encoding would be a good place to start. DEOVR shared a 2900p sample .mp4 file encoded with AV1, but you can also create your own with a Bitmovin trial account.

Ready to start encoding your own 360 content for Meta Quest headsets? Sign up for a free trial and get going today!

Bitmovin Receives Excellence in DASH Award for Tile-Based Streaming of VR and 360° Video

Tanya Vernitsky — Wed, 28 Jun 2017 09:24:40 +0000

Tile-Based Streaming is set to play a major role in delivering VR and 360 video to mainstream audiences by reducing bandwidth requirements, reducing costs and vastly increasing accessibility.

Bitmovin engineers and co-founders Mario Graf (@grafmar_io), Christian Timmerer (@timse7), and Christopher Mueller (@chris_bitmovin) have been awarded the Excellence in DASH Award at ACM Multimedia Systems 2017 in Taipei, Taiwan.

The Bitmovin team receives the third place award for their paper “Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP: Design, Implementation, and Evaluation” [PDF], [Session Slides]. In this paper, the researchers analyze adaptive bitrate streaming of VR and 360-degree video over HTTP and describe the use of tiles, as specified within modern video codecs, such HEVC/H.265 and VP9, to recognize bitrate savings of 40-65%.
These findings establish a baseline for advanced streaming techniques of immersive video, such as VR and 360-degree video, including real-life applications and the outline of the research roadmap. Bitmovin is committed to shaping the future of online video and building streaming solutions for commercial applications that enhance end-user experience and reduce friction for video developers. You can learn more about Bitmovin end-to-end support for immersive video in this tutorial – VR & 360° Video and Adaptive Bitrate Streaming.

The award is established by the DASH Industry Forum. The DASH-IF “creates interoperability guidelines on the usage of the MPEG-DASH streaming standard, promotes and catalyze the adoption of MPEG-DASH and help transition it from a specification into a real business. It consists of the major streaming and media companies, such as Microsoft, Netflix, Google, Ericsson, Samsung and Adobe.” Bitmovin was among the first to deploy DASH in accordance with the DASH-IF open standard guidelines. We are an active member of the DASH-IF and actively contribute research and testing data to the DASH community.

What and Why is Tile Based Streaming?

The nature of 360 video creates much larger files sizes simply due to the extra pixels required for a spherical image. But many of these pixels delivered in the video stream are outside of the viewport and are never seen. This causes unnecessarily high bandwidth requirements and CDN costs.
Tile based streaming is a revolutionary technique that solves this problem by breaking a 360° video into “tiles”, and streams the highest quality only to visible sections of the video. At the same time it uses lower quality (smaller) files for unseen tiles.

This technique will be among the next major innovations in the area of 360 & VR video, and will offer huge cost savings and quality improvements, and Bitmovin is leading the way towards making this technology available for commercial applications.
To see tile-based streaming in action, request a demo with our video solutions experts.

The post Bitmovin Receives Excellence in DASH Award for Tile-Based Streaming of VR and 360° Video appeared first on Bitmovin.