audio encoding – Bitmovin

Providing a Premium Audio Experience in HLS with the Bitmovin Encoder

Mathew Carrigan — Mon, 01 Jul 2024 14:53:51 +0000

Introduction

Many streaming providers are looking for ways to offer a more premium and high quality experience to their users. One often overlooked component in streaming quality is audio – and more specifically which audio bitrates, channel layouts, and even audio languages are available and how these options can be delivered to the viewers on a range of devices. While there many ways of improving the video streaming quality & experience such as Per-Title Encoding, Multi-Bitrate Video, High Dynamic Range (HDR), and high resolutions, there are also some some great ways of enhancing a user’s experience with premium hls audio. Some of the most important considerations for audio streaming are:

Adaptive Streaming: serving multiple audio bitrates for various streaming conditions
Reduced Bandwidth & Device Compatibility: multi-codec audio for better compression at reduced bitrates
Improved User Experience: 5.1(or greater) surround sound or even lossless audio
Accessibility and Localization: such as multi-language or descriptive audio

You can learn even more about how audio encoding affects the streaming experience in this blog.

In Bitmovin’s 2023-24 Video Developer Report, we saw that immersive audio ranked in the top 15 areas for innovation; while audio transcription was the #1 ranked use-case for AI and ML. Furthermore, though AAC remains the the most widely used audio codec – mostly due to it’s wide device support, we see that both Dolby Digital/+ and Dolby Atmos are the #2 and #3 ranked audio codecs that streaming companies are either currently supporting or planning on supporting in the near future.

Audio codec usage – source: Bitmovin Video Developer Report

With HLS and its multivariant approach, this is all possible; but understanding just how to construct and organize your HLS multivariant playlist can be tricky at first. In this tutorial we will take a look at some best practices in HLS for serving alternate audio renditions as well as an example at the end of this article showcasing how to simply do this using the Bitmovin Encoder.

Basic audio stream packaging

The most basic way to package audio for HLS is to mux the audio track with each video track. This works for very simple configurations where you are only dealing with outputting a single AAC Stereo audio track at a single given bitrate. While the benefit of this approach is simplicity, it has many limitations such as not being able to support multi-channel surround sound, advanced codecs, and multi-language support. Additionally demuxing audio and video comes with benefit of using other muxing containers like fragmented MP4 or CMAF which don’t require client-side transmuxing. Additionally, keeping audio and video muxed together comes with inefficient storage and delivery as each video variant will have the audio duplicated. Similarly, demuxed audio and video allows for the use MP4 and CMAF containers which are more performant for client devices since they won’t have to demux or transmux the segments real-time.

A multivariant playlist output for this would look something like:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-STREAM-INF:BANDWIDTH=4255267,AVERAGE-BANDWIDTH=4255267,CODECS="avc1.4d4032,mp4a.40.2",RESOLUTION=2560x1440
manifest_1.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=3062896,AVERAGE-BANDWIDTH=3062896,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1920x1080
manifest_2.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=1591232,AVERAGE-BANDWIDTH=1591232,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1600x900
manifest_3.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=1365632,AVERAGE-BANDWIDTH=1365632,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720
manifest_4.m3u8

#EXT-X-STREAM-INF:BANDWIDTH=862995,AVERAGE-BANDWIDTH=862995,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=960x540
manifest_5.m3u8

Audio/Video demuxing

A better approach is to demux the Audio and Video tracks – luckily HLS makes this simple by the use of HLS EXT-X-MEDIA playlists which is the standard way of declaring alternate content renditions for audio, subtitle, closed-captions, or video(mostly used alternative viewing angles such as in live sports). With the use of EXT-X-MEDIA to decouple audio from video, we can add in many great audio features such as supporting alternate/dubbed language tracks, surround sound tracks, multiple audio qualities, and multi-codec audio.

By supplying audio tracks with EXT-X-MEDIA tags, we can explicitly add each audio track that we want to output as well as group them together – Then we can correlate each Video Variant(EXT-X-STREAM-INF) to one of the grouped Audio Media Playlists.

Using the previous example of a single AAC Stereo Audio track, a demuxed audio/video output would look like:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC_Stereo",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=YES,URI="audio_aac.m3u8"

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.2",RESOLUTION=2560x1440,AUDIO="AAC_Stereo"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1920x1080,AUDIO="AAC_Stereo"
manifest_2.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1600x900,AUDIO="AAC_Stereo"
manifest_3.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,AUDIO="AAC_Stereo"
manifest_4.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=960x540,AUDIO="AAC_Stereo"
manifest_5.m3u8

Here, you can first see we declare a single Audio Media(EXT-X-MEDIA) playlist for our audio track and give it a group-id attribute value of “AAC_Stereo“. Then each Video Variant EXT-X-STREAM-INF tag uses the “AUDIO” attribute to associate its video track to the Audio Media group “AAC_Stereo“.

Multiple audio bitrates

But now let’s imagine we want to better optimize our Adaptive Streaming to deliver our AAC Stereo audio in multiple bitrates such as a high(196kbps) and low(64kbps) so that the higher resolution Video Variants can take advantage of higher quality+bitrate audio given the increase in bandwidth when streaming those variants. We can accomplish this by encoding our audio with both a low and high bitrate outputs and group them separately – then decide which Video Variant gets which Audio bitrate/quality. – For example, our 720p or below variants get the lower quality audio by default, and our full HD or above variants get the higher quality audio by default. Just think of that as defaults though, because most modern Players that stream HLS, will allow for independently picking which audio quality to play based on Adaptive-Bitrate streaming conditions.

An example of utilizing a low and a high AAC Stereo tracks would look like:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac-stereo-64",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=YES,URI="audio_aac_64k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac-stereo-196",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aac_196k.m3u8"

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.2",RESOLUTION=2560x1440,AUDIO="aac-stereo-196"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1920x1080,AUDIO="aac-stereo-196"
manifest_2.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4028,mp4a.40.2",RESOLUTION=1600x900,AUDIO="aac-stereo-196"
manifest_3.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=1280x720,AUDIO="aac-stereo-64"
manifest_4.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=960x540,AUDIO="aac-stereo-64"
manifest_5.m3u8

In this example, we now have two audio tracks, one for each bitrate, and therefore have two Audio Media (EXT-X-MEDIA) playlists defined, each having unique GROUP-ID attribute, but the same NAME attribute. This is a good way declaring that the audio tracks are the same language, channel config, and codec, but at different qualities. Now, we can declare that each Video Variant(EXT-X-STREAM-INF) that is 720p or less sets the AUDIO group for that variant to the low bitrate Audio Track(GROUP-ID="aac-stereo-64") and those variants above 720p get the higher bitrate AUDIO group(GROUP-ID="aac-stereo-196") by default (but again, most Players can manage the audio tracks independently for optimal adaptive streaming).

This is at least an improvement on the previous single-bitrate audio packaging – But still, there are plenty of enhancements we can make!

More efficient AAC

The previous examples are all relying on Low Complexity AAC(AAC-LC) because this basic audio codec is supported by every playback device. It is necessary to always have at least one AAC-LC track to be able support older devices. However, most devices these days can support more efficient versions of AAC such as High Efficiency AAC(AAC-HE) which comes in two main versions: v2 which is used for bitrates up to 48kbps and v1 which is used for bitrates up to 96kbps.

So let’s adapt our previous example to not rely on 2 (or more) different AAC-LC audio tracks, and instead output one AAC-HE v1, one AAC-HE v2, and one AAC-LC rendition. The tricky part here is that we will want to group each of the above into a different GROUP-ID so that the Player client can decide which to use based on which codecs it supports – but we also will want each Video Variant to be able to use any of those audio tracks. To accomplish this, all we need to do is duplicate each Video Variant for each of the 3 unique Audio Media GROUP-IDs.

A note on grouping audio renditions

The apple authoring spec recommends creating one audio group for each pair of codec and channel count.

We now have have 3 different versions of the AAC codec so we will have 3 different audio groups.

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_lc-stereo-128k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=YES,URI="audio_aaclc_128k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_he1-stereo-64k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aache1_64k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_he2-stereo-32k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aache2_32k.m3u8"

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.2",RESOLUTION=2560x1440,AUDIO="aac_lc-stereo-128k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.5",RESOLUTION=2560x1440,AUDIO="aac_he1-stereo-64k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.29",RESOLUTION=2560x1440,AUDIO="aac_he2-stereo-32k"
manifest_1.m3u8

## Repeat above approach for each additional Video Variant

In this example, you can see that we replicated the 1440p variant 3 times – 1 for reach Audio Media GROUP-ID which would then be repeated for each additional Video Variant. This will allow the client Player to decide for a given Video Variant, which audio track group to use based upon codec support and streaming conditions. Also take note how each Video Variant’s CODECS attribute is updated to represent the necessary audio codec identifier.

Surround sound audio

Now, let’s say we also want to be able to support 5.1 surround sound for those clients which can benefit from it. For this we can decide on which surround sound codec we want to support. Let’s use Dolby Digital AC-3 for this example. Since we are now relying on a more advanced audio codec for optimal surround experience, it is also be important to consider devices that may have 5.1 or greater speaker setups, but that can NOT support Dolby Digital. For this we will also include a secondary 5.1 track using basic AAC-LC codec. Now, we will create 2 new Audio Media playlists with unique GROUP-ID and NAME attributes.

A note on downmixing from 5.1 audio sources

In this example, we will assume the source has a Dolby Digital surround audio track. From that single audio source, we will create create our AC-3 surround track, implicitly convert to our AAC surround track, and automatically downmix the source 5.1 to our various AAC 2.0 Stereo outputs using the Bitmovin Encoder which is shown in sample code at the bottom of this article. Alternatively you can do all sorts of mixing, channel-swapping, as well as work with distinct audio input files like separate files for each channel for example. You can learn more about that here.

Don’t forget about grouping audio renditions

As previously mentioned, the apple authoring spec recommends creating one audio group for each pair of codec and channel count.

We now have have 5 different unique combinations of codecs and channel counts so we will have 5 different audio groups.

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_lc-stereo-128k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=YES,URI="audio_aac_128k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_he1-stereo-64k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aache1_64k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_he2-stereo-32k",LANGUAGE="en",NAME="English - Stereo",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aache2_32k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac_lc-5_1-320k",LANGUAGE="en",NAME="English - 5.1",AUTOSELECT=YES,DEFAULT=NO,URI="audio_aac_lc_5_1_320k.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="dolby",LANGUAGE="en",NAME="English - Dolby",CHANNELS="6",URI="audio_dolbydigital.m3u8"

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.2",RESOLUTION=2560x1440,AUDIO="aac_lc-stereo-128k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.5",RESOLUTION=2560x1440,AUDIO="aac_he1-stereo-64k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.29",RESOLUTION=2560x1440,AUDIO="aac_he2-stereo-32k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,mp4a.40.29",RESOLUTION=2560x1440,AUDIO="aac_lc-5_1-320k"
manifest_1.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4d4032,ac-3",RESOLUTION=2560x1440,AUDIO="dolby"
manifest_1.m3u8


## Repeat above approach for each additional Video Variant

Here you can see that now we have the 1440p variant replicated a total of 5 times, once for each Audio Media GROUP-ID which allows the client Player to select the most appropriate audio and video track combination.

Again, note how each duplicated Video Variant has an updated CODECS attribute to represent the appropriate audio codec associated to it. One major reason we duplicate each Video Variant for each Audio Media GROUP-ID is that most devices cannot handle switching between audio codec’s during playback; so as Adaptive-Bitrate logic on the Player switches between different Video Variant’s it will pick the variant that has the same audio codec that it has been using. Additionally, in HLS, we cannot simply list the Video Variant once and add all of the various audio codecs to the CODECS attribute. This is because per HLS, the client device MUST be able to support all of the CODECS mentioned on a given Video Variant(EXT-X-STREAM-INF) to avoid possible playback failures. So instead, we separate out the Video Variants per each codec + channel number set.

Multi-language audio

This is all great, but what if I want to support additional dubbed audio language tracks or even Descriptive Audio tracks? Luckily, that is rather simple to do. We can just create additional AudioMedia playlists for each language and utilize the existing GROUP-IDs depending on which codecs and formats we want to support. We can use the existing GROUP-IDs which are logically grouped by Codec and Channel pairing per the Apple authoring spec, then we can add our additional language tracks to those existing groups.

#EXTM3U
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-VERSION:6
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-HE-V1-Stereo",NAME="English-Stereo",LANGUAGE="en",DEFAULT=NO,URI="audio_aache1_stereo.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-HE-V1-Stereo",NAME="Spanish-Stereo",LANGUAGE="es",DEFAULT=NO,URI="audio_aache1_stereo_es.m3u8"

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-HE-V2-Stereo",NAME="English-Stereo",LANGUAGE="en",DEFAULT=NO,URI="audio_aache2_stereo.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-HE-V2-Stereo",NAME="Spanish-Stereo",LANGUAGE="es",DEFAULT=NO,URI="audio_aache2_stereo_es.m3u8"

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-LC-5.1",NAME="English-5.1",LANGUAGE="en",DEFAULT=NO,URI="audio_aaclc-5_1.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-LC-5.1",NAME="Spanish-5.1",LANGUAGE="es",DEFAULT=NO,URI="audio_aaclc-5_1_es.m3u8"

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-LC-Stereo",NAME="English-Stereo",LANGUAGE="en",DEFAULT=NO,URI="audio_aaclc_stereo.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AAC-LC-Stereo",NAME="Spanish-Stereo",LANGUAGE="es",DEFAULT=NO,URI="audio_aaclc_stereo_es.m3u8"

#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AC-3-5.1",NAME="English-Dolby",LANGUAGE="en",CHANNELS="6",DEFAULT=NO,URI="dolby-ac3-5_1.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="AC-3-5.1",NAME="Spanish-Dolby",LANGUAGE="es",CHANNELS="6",DEFAULT=NO,URI="dolby-ac3-5_1_es.m3u8"

#EXT-X-STREAM-INF:...,CODECS="avc1.4D401F,ac-3",RESOLUTION=1280x720,AUDIO="AC-3-5.1".0
video_720_3000000.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4D401F,mp4a.40.29",RESOLUTION=1280x720,AUDIO="AAC-HE-V2-Stereo".0
video_720_3000000.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4D401F,mp4a.40.2",RESOLUTION=1280x720,AUDIO="AAC-LC-Stereo".0
video_720_3000000.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4D401F,mp4a.40.2",RESOLUTION=1280x720,AUDIO="AAC-LC-5.1".0
video_720_3000000.m3u8

#EXT-X-STREAM-INF:...,CODECS="avc1.4D401F,mp4a.40.5",RESOLUTION=1280x720,AUDIO="AAC-HE-V1-Stereo".0
video_720_3000000.m3u8

How does this differ from DASH?

In DASH, demuxed Audio and Video tracks are grouped into separate AdaptationSets for a given period. This means each given Video AdaptationSet is not directly linked to one specific Audio track, but rather the client Player independently picks a Video Representation from the Video AdaptationSet and a Audio Representation from the Audio AdaptationSet. So with DASH, we don’t have to worry about re-stating Video tracks for each group of Audio tracks as they are managed independently of each other.

Additional notes

The video codecs you choose to support may also determine which audio codecs and container formats you use. For example if you encode video to VP9 you may want to consider using vorbis or opus audio codecs.

In this example, we used AC-3 for Dolby Digital 5.1, but you may consider using Enhanced AC-3 or more commonly referred to as E-AC-3 for additional channel support(such as 7.1 or more) or spatial audio support like Dolby Atmos. Other premium surround sound codec options are DTS:HD and DTS:X.

Premium HLS audio example with the Bitmovin Encoder & Manifest Generator

Below linked GitHub sample is a pseudo-code example using the Bitmovin Javascript/Typescript SDK that demonstrates outputting multi-bitrate, multi-codec, multi-channel, and multi-language audio tracks. This can greatly enhance user’s experience as it allows for streaming the best quality and most appropriate audio for each device’s codec support and speaker channel configuration.

With the Bitmovin Encoder, we can use one master (Dolby Digital surround in this example) audio file/stream for each language and easily downmix it to 2.0 stereo or implicitly convert it to AAC 5.1. Then, once we simply create each desired audio track, we will use the Bitmovin Manifest Generator to create our HLS multivariant playlists.

Encoding Example For HLS With Multiple Audio Layers

The post Providing a Premium Audio Experience in HLS with the Bitmovin Encoder appeared first on Bitmovin.

Using Spatial Audio to Improve User Experience

James Konik — Wed, 18 Aug 2021 20:24:07 +0000

Spatial audio aims to place you at the center of a three-dimensional listening experience. Done right, it goes beyond regular stereo audio, surrounding you with sound and fully immersing the listener. With many real-world music venues closed down due to COVID, it is a great way to make you feel like you’re out of your living room and getting a taste of the real thing.
As a video content provider, offering a spatial audio experience for your customers can give you that extra push of immersion that will set you apart from your competitors. In this article, we’ll look at three of the biggest players in spatial audio and see what they bring to the table. If you want to know which vision matches yours most closely, this guide will show you.

What Is Spatial Audio?

Stereo audio involves separating sound into two separate channels. This can be done by recording with two microphones, or by artificially varying audio properties over each channel. This is used in the common, two-speaker setups most people have to add direction and depth to sound.
Spatial audio goes further than this. As well as using additional speakers, it includes height information, enabling sound to come from above or below you. It also enables objects in the audio to come from specific positions, like the top left corner or directly behind you.
When applied to music, cinema, and games it can take realism to a whole new level. There are several competing versions of the technology, each vying for the attention of creators and users alike. Dolby Atmos, Sony 360 Reality Audio, and Apple Spatial Audio are the major players and have their own pros and cons.

Dolby Atmos

Dolby Atmos is the venerable granddaddy of the spatial audio scene. Since launching in 2012 it has become very well established, and there is no shortage of Atmos content available across multiple platforms and streaming services. 99% of global consumers can access a service with Dolby Vision and Dolby Atmos content.
Its object-based audio means each element in the soundtrack can be given a location and direction. A speaker setup can recreate that, so it sounds as if everything comes from a specific place. It also features height virtualization. This uses filters to simulate audio cues that the ear uses to differentiate between sounds coming from different heights. With 128 tracks there’s the potential for some extremely in-depth soundscapes.
Over 6000 cinemas already use the technology, and it is used in many films, music content as well as sports events.
With such widespread support, you can experience Dolby Atmos via Blu-ray disc, 7 of the top 10 streaming services including Netflix, Disney+, Apple TV+ and HBO Max, music streaming services including Apple Music, Amazon, and Tidal, and live sports events with SKY UK, BT Sports, and Comcast. Not every service works with it on every device, so be sure to do your research when setting it up. However, it’s gradually becoming more accessible to more streaming services, as back-end video solutions such as Bitmovin’s encoding, make it easier to build Dolby Atmos-supported workflows.
Cinema-based Atmos setups make use of large numbers of speakers, including many embedded in the walls and ceiling. At home, you don’t need to go that far, with two to four ceiling speakers helping you get plenty out of the system. Even easier setup can be found with the latest soundbar systems, which aim the sound upward to reflect off of the ceiling, making the setup literally plug and play.

Dolby Atmos In-Home Experience

Plenty of TVs and other equipment already work with Dolby Atmos, making it a great choice for newcomers to spatial audio. Even those of us with regular headphones can join the party. The immersive experience and wide availability make it worth it.

Verdict

Dolby Atmos is a well-established format, and it’s easy to find hardware and content that works with it. It offers rich, directional sound in a variety of settings.
Pros

Widespread content support.
The largest quantity of compatible hardware.

Sony 360 Reality Audio

Sony 360 Reality Audio is a new technology that delivers an amazing listening experience through headphones. It aims to sound as good as speakers, with audio coming from multiple directions. It allows you to control the location of all audio elements over 64 speaker channels. It uses the MPEG-H 3D audio standard, and Sony has worked with the Fraunhofer IIS to make the format accessible to creatives.
Its technological secret sauce is “binaural rendering”, which works by varying sounds in the same ways your brain uses to work out where a sound source comes from. Sounds can reach your ears at different times and are also altered by the shape of your head and ears. Several streaming services currently support the format. You can listen to it on Tidal, Deezer, and Nugs. Amazon Music HD also works with it, but you’ll need the right speaker set up.
The format is also compatible with video streaming. There’s little support for it now, but more is on the way.
It doesn’t need much specialized hardware, just high-definition compatible headphones. It uses an app that lets you scan your ear shape and send the results to streaming services, for a uniquely personalized experience. You can also use it with the Amazon Echo Studio speaker. Several major artists are producing music with the format such as Tony Visconti and Pharell Williams, and there are already over 1000 songs available to listen to.
For those interested in getting into it themselves, Sony has a licensing program that includes mixing software. There’s also a recommended infrastructure to help get your studio working with it. The future looks bright for Reality Audio with comments in Android’s Open Source Project suggesting it could arrive on Android soon, and be adapted so it works on any speakers or headphones. Definitely, one to keep an eye on.

Verdict

Reality Audio uses a range of exciting technologies to make directional sound feel as immersive as possible. Sony offers plenty of help to creatives who want to work with it.
Pros
* Excellent choice for headphone users.
* Dedicated software and support for studios.
Cons
* Relatively limited catalogue.

Apple Spatial Audio

Apple Spatial Audio lets you use existing sound formats with AirPods to give you immersive audio. It works by using filters to make sounds seem to come from any direction, including above or behind you. It can be applied to 5.1, 7.1, or Dolby Atmos sources. That means it doesn’t need a dedicated audio format, unlike the other technologies described here. Its killer feature is tracking. Not only can it track your head position, but it also detects where your device is. That lets it dynamically adjust the filters to make it sound like your audio is coming from the right direction, even over headphones.

It takes advantage of gyroscopes embedded in the AirPods to figure out where you are moving and adjusts the sound direction accordingly. That means you can turn your head towards a sound coming to the side of you, and it will then seem to be in front of you. As well as multi-directional sound, it also keeps sounds distinct, making dialogue easier to hear in movies. It’s available for free, as long as you have the right hardware. That means at least iOS 14 and a more recent apple device, such as the iPhone 7. You’ll also need AirPods pro or max.
Apple Spatial Audio isn’t yet supported on Apple TV devices, despite rumours the new 2021 Apple TV 4k would be compatible. Software support comes from HBO Go, Hulu, Vudu, and Amazon Prime Video, which all use formats that work with it. You can also use plugins to use spatial audio with stereo sources. Netflix support is rumoured to be on the way, too.
If you want to check if spatial audio is working on your device, look for a blue animated icon in the settings. You can find it by touching and holding the volume icon in the control center.

How to turn Spatial audio on iOS

Verdict

If you want to reach Apple fans with the latest devices, Spatial Audio is a great choice, delivering an outstanding experience across a broad range of media.
Pros
* Head tracking is a hugely impressive feature.
* Works with existing formats.
Cons
* Tied to specific Apple hardware.

Conclusion

Spatial audio can take your streaming content to the next level. Your audience will be immersed in sound, and the experience will be much better. If you want to give your work an edge over the competition, spatial audio has the next-gen factor to help you do that.
As you’ve learned here, there are multiple versions of the technology, which can be deployed through a range of channels. Using it isn’t as hard as you think.
If you want to learn more about spatial audio, and how you can use it to wow your audience, get in touch with Bitmovin. They specialize in audio and video codecs and can show you how to get the most out of them, delivering quality content while using bandwidth as efficiently as possible. That keeps your costs down too, meaning you can stay more competitive while delivering your content in new and exciting ways.

Video technology guides and articles

Back to Basics: Guide to the HTML5 Video Tag
What is a VoD Platform? A comprehensive guide to Video on Demand (VOD)
Video Technology [2022]: Top 5 video technology trends
HEVC vs VP9: Modern codecs comparison
What is the AV1 Codec?
Video Compression: Encoding Definition and Adaptive Bitrate
What is adaptive bitrate streaming
MP4 vs MKV: Battle of the Video Formats
AVOD vs SVOD; the “fall” of SVOD and Rise of AVOD & TVOD (Video Tech Trends)
MPEG-DASH (Dynamic Adaptive Streaming over HTTP)
Container Formats: The 4 most common container formats and why they matter to you.
Quality of Experience (QoE) in Video Technology [2022 Guide]

The post Using Spatial Audio to Improve User Experience appeared first on Bitmovin.

Why Audio Encoding is Just as Important as Video Encoding

Holiviel Valdez — Thu, 06 May 2021 13:20:37 +0000

When you think about streaming online content, you might be tempted to focus on the visual aspects, like a high bit rate or the latest codecs, but this is only half the battle for a superior video experience. The audio quality for any streaming video can be the difference between a good movie night for your clients and a bad one.
In this post, we’ll talk about how audio encoding affects the streaming experience. I’ll cover some basics—like what is a codec—and then discuss the benefits of audio encoding, the pros and cons of the most common codec formats, and how to make sure your audio encoding complements your video encoding.

What Is a Codec?

The term codec is a combination of the words coder and decoder. A codec is a standard for encoding and decoding multimedia files to represent data in a specific format.

The first thing a codec does is encode a video or audio file. For lossy codecs, this involves dropping “extra” information from raw or uncompressed audio files in order to reduce file size while maintaining as much quality as possible. This process involves a sequence of complex mathematical functions.
The second role of a codec is to decode, which is essentially playing back a video or audio file that’s been encoded. Think of it as reversing the math from the encoding step.
In short, an audio codec is a protocol for compressing digital audio to save space during transmission and then decoding for playback with the video.

Advantages of Audio Encoding

If your application delivers audio or video (or even still images), knowing your options for encoding is useful. For instance, if you know the specs of different audio/video codecs and the best use cases for each, you might be able to improve the experience of users with a bad internet connection.
Here are just a handful of advantages of proper audio encoding:

Less storage space is needed: Encoded data files are smaller, so you should be able to save space on your storage. This is ideal if you have large amounts of data that need archived.
Data is sent over the network faster: Encoding removes redundancies from data, so again, the size of your files is a lot smaller. This results in faster input, even with bad internet connections.
Encoded files consume fewer resources: They reduce the resources required from your machine, like the amount of RAM and processing power when you’re listening to audio files.
Adaptable: Different codec formats are useful for different kinds of projects. For example, the AAC codec can use different frequency ranges with the help of joint encoding to achieve higher quality, a smaller file size, or, the best scenario, both. More advanced audiophiles will notice and appreciate these changes while playing your audio.

Encoding your audio files is a critical part of your video encoding workflow, but just as there are many types of codecs for video, audio has a number of options you can use.

Common Audio Codecs

One important thing to keep in mind when selecting a codec is the devices and services it supports. Some streaming services support a single audio codec, but not another. Some offer better quality, and others focus mainly on compression. Remember, you need a balance between quality and support.
With this in mind, let’s explore some of the most common and best-supported audio codecs.

MP3

MP3 stands for MPEG-2 Audio Layer 3. The most common and well-known audio format, MP3 revolutionized digital audio. Its files were much smaller than previous formats, allowing them to be streamed and downloaded over the internet.
MP3 is a well-supported codec—you can run MP3 files on almost any online or desktop media player, like QuickTime, VLC media player, and Kodi.

AAC

AAC stands for Advanced Audio Coding. Developed a few years after MP3, AAC built on the success of that format but increased compression efficiency. Like most of the more popular codecs, AAC is lossy, but it provides very good audio quality in limited bandwidth, especially when compared to MP3.
It’s a closed-source format but is probably the most widely used audio codec on the internet today. It’s supported by most video-streaming platforms.

AIFF

AIFF stands for Audio Interchange File Format and was developed by Apple. AIFF files are very large, around 10 MB for one minute of standard audio recording.
Most AIFF files contain uncompressed audio in PCM (pulse-code modulation) format. The AIFF file is just a wrapper for the PCM encoding, making it more suitable for use on Mac systems. However, Windows can usually open AIFF files without any issues.

FLAC

FLAC stands for Free Lossless Audio Codec. A bit on the nose maybe, but it has quickly become one of the most popular lossless formats available since its introduction in 2001. Note that with lossless codecs, all the information is retained when the file is compressed.
FLAC can compress audio files without losing a significant amount of data. What’s even nicer is that it’s an open-source and royalty-free audio file format.
Most major services and common devices support FLAC, and it’s the main alternative to MP3 for music. You basically get the full quality of raw uncompressed audio at half the file size. The problem with it is the files are still rather large. If you want to save space, this is not the better option.

Ogg (Vorbis)

Ogg isn’t a fancy acronym; it’s just a container format for one or more codecs. Vorbis is a free open-source lossy format often used with Ogg containers and was created specifically to provide that balance between high quality and efficient streaming. It performs significantly better than most other lossy compression formats (meaning it produces a smaller file size for equivalent audio quality).
Since Vorbis is free, it’s been utilized in a number of both commercial and noncommercial media players, including Spotify.

Opus

Much like its predecessor in Vorbis, Opus is not an acronym and is also a free open-source lossy format that was developed by the same creator as Vorbis, Christopher Montegomery (and Xiph.org). Opus is much more ambitious in its scope than Vorbis, as it supports every kind of audio file available (including music, speech, and real-time voice communication). It’s contained by all major audio containers: Ogg, Matroska, WebM, MPEG-TS.
Opus does just about everything when it comes to audio compression, the caveat of Opus is its complexity and CPU requirements, which have limited its current implementations. Despite that, Opus has become very rapidly and widely adopted by most mainstream OS’s, such as WhatsApp, Android, iOS, Windows, and Playstation.

The Best Audio Codec

Of the commonly used codecs listed here, AAC is the best audio codec for most situations. It’s supported by a wide range of devices and streaming services and has the advantage of better audio quality as compared to MP3.
This may change very soon as Opus becomes more broadly popular. However, hardware doesn’t change as quickly as software, so broad device support is probably still a few years away. For internet video, AAC is currently the best audio codec for live-streaming, as well as video on demand.

Other Considerations for Quality Audio Encoding

Of course, audio encoding is more than just finding the right codec. To get a more complete picture and truly appreciate why audio encoding is just as important as video encoding, let’s consider a few more areas where we can ensure quality audio encoding.

Sample Rates

The sample rate indicates how often an audio clip is recorded per second. Sampling frequencies are measured in hertz (Hz) or kilohertz (kHz)—44,100 samples per second can be expressed as 44,100 Hz or 44.1 kHz.
For digital audio recordings, the sample rate is comparable to the frame rate of a video. The more audio data (samples) is collected, the closer the recorded data is to the original audio.

Bit Depth

Bit depth measures how many bits were captured in each sample. So the higher the bit depth, the more accurately the actual analog audio source can be expressed.
The lowest possible bit depth only has two options to measure the accuracy of the sound: 0 for total silence and 1 for total volume. The higher the bit depth, the more accurate the encoded sound. Case in point, a standard 16-bit audio CD offers 216 (or 65,536) values.

Bit Rates

The bit rate is the amount of data being processed within a given period of time. Common measurements for bit rate include kbps (kilobits per second) and mbps (megabits per second). High bit rates don’t necessarily mean high quality on their own; other factors also need to be considered, like internet speed. But apart from that, the higher the bit rate, generally the sharper the streaming experience will be.
Recommended audio bit rate encoding standards for video include:

Constant Bit Rate (CBR): Keeps the bit rate constant throughout playback. CBR usually encodes faster than VBR, but it does take up more space.
Variable Bit Rate (VBR): Different bit rates are used to encode audio in more complex areas that require more data. Despite the coding time and the lack of support from software and hardware, VBR offers a much better quality-to-storage ratio.
Average Bit Rate (ABR): A subset of VBR. The encoder achieves an average bit rate by having blocks of both lower and higher bit rates.

Conclusion

Audio encoding isn’t something to ignore during your video encoding process. Paying attention to the technical aspects of audio encoding and optimizing for your use cases, in particular, can go a long way toward ensuring the overall quality of the video you deliver.
Looking for video encoding software with modern audio codecs built-in? Check out Bitmovin and offload some of the technical overhead for video and audio encoding.

Video technology guides and articles

Back to Basics: Guide to the HTML5 Video Tag
What is a VoD Platform?A comprehensive guide to Video on Demand (VOD)
Video Technology [2022]: Top 5 video technology trends
HEVC vs VP9: Modern codecs comparison
What is the AV1 Codec?
Video Compression: Encoding Definition and Adaptive Bitrate
What is adaptive bitrate streaming
MP4 vs MKV: Battle of the Video Formats
AVOD vs SVOD; the “fall” of SVOD and Rise of AVOD & TVOD (Video Tech Trends)
MPEG-DASH (Dynamic Adaptive Streaming over HTTP)
Container Formats: The 4 most common container formats and why they matter to you.
Quality of Experience (QoE) in Video Technology [2022 Guide]

The post Why Audio Encoding is Just as Important as Video Encoding appeared first on Bitmovin.