NVIDIA – Bitmovin

NVIDIA GTC24: Highlights for Video Streaming Workflows

Andy Francis — Fri, 05 Apr 2024 20:38:00 +0000

NVIDIA GTC Video Streaming Workflow Highlights

NVIDIA GTC (GPU-Technology Conference) is an annual conference with training and exhibition for all aspects of GPU(Graphics Processing Unit) accelerated computing. GTC 2024 was held in March with the tagline “The Conference for the Era of AI” and as expected, generative AI was a huge focus this year. There were also several other emerging applications of AI including advanced robotics, autonomous vehicles, climate modeling and new drug discovery.

When GPUs were first introduced, they were mainly used for rendering graphics in video game systems. In the mid-late ‘90s, NVIDIA’s programmable GPUs opened up new possibilities for accelerated video decoding and transcoding workflows. Even though GPUs may now be more associated with powering AI solutions, they still play an important role for many video applications and there were several sessions and announcements covering the latest video-related updates at GTC24. Keep reading to learn more about the highlights.

Video technology updates

In a session titled NVIDIA GPU Video Technologies: New Features, Improvements, and Cloud APIs, Abhijit Patait, Sr. Director of Multimedia and AI at NVIDIA, shared the latest updates and new features available for processing video with their GPUs. Some highlights that are now available in NVIDIA’s Video Codec SDK 12.2:

15% quality improvement for HEVC encoding, thanks to several enhancements:
- UHQ (Ultra-high quality) tuning info for latency-tolerant use cases
- Increased lookahead analysis
- Temporal filtering for noise reduction
- Unidirectional B-frames for latency-sensitive use cases
Encode 8-bit content as 10-bit for higher quality (HEVC and AV1)

Comparison of HEVC encodings with equivalent quality using 18Mbps with HQ tuning, but only 10Mbps with the new UHQ tuning – source: GTC24

There were also several “Connect with Experts” sessions held where attendees could meet and ask questions of various NVIDIA subject matter experts. In the Building Efficient Video Transcoding Pipelines Enabling 8K session, they shared how multiple NVENC instances can be used in parallel for split-frame encoding to speed up 8K transcoding workflows. This topic is also covered in detail in their developer blog here.

Split frame encoding with NVIDIA GPUs – source: NVIDIA developer blog

VMAF-CUDA: Faster video quality analysis

Snap and NVIDIA gave a joint presentation around a collaborative project they worked on (including participation from Netflix) to optimize and implement VMAF (Video Multi-Method Assessment Fusion) quality calculations on NVIDIA CUDA cores. CUDA (Compute Unified Device Architecture) cores are general-purpose processing units available on NVIDIA GPUs that allow for parallel processing and applications that are complementary to the dedicated GPU circuits.

NVIDIA GPU video capabilities and components – source: nvidia.com

During the talk, they explained how implementing VMAF-CUDA enabled Snap to run their video quality assessments in parallel to the transcoding being done on NVIDIA GPUs. The new method runs several times faster and more efficiently than running VMAF on CPU instances. It was so successful that Snap is now planning to transition all VMAF calculations to GPUs, even for transcoding workflows that are CPU-based. They also published the technical details in this blog post for those interested in learning more.

VMAF calculation speed comparison, GPU vs CPU – source: NVIDIA developer blog

Netflix Vision AI workflows

In a joint presentation by Netflix and NVIDIA, Streamed Video Processing for Cloud-Scale Vision AI Services, they shared how Netflix is using computer vision and AI at scale throughout their stack. Netflix is a bit unique not only in their massive scale, but also that they are vertically integrated and have people working on every part of the chain from content creation through distribution. This opens a lot of opportunities for using AI along with the challenge of deploying solutions at scale.

They shared examples from:

Pre-production: Storyboarding, Pre-visualization
Post-production: QC, Compositing and visual fx, Video search
Promotional media: Generating multi-format artwork, posters, trailers; Synopsis
Globalization/localization of content: Multi-language subtitling and dubbing

They also discussed the pros and cons of using an off-the-shelf framework like NVIDIA’s DeepStream SDK for computer vision workflows (ease of use, efficiency of set up) vs building your own modular workflow (customization, efficiency of use) with components like CV-CUDA Operators for pre- and post-processing of images and TensorRT for deep-learning inference.

They also went into some detail on one application of computer vision in the post-production process, where they used object detection to identify when the clapperboard appeared in footage and sync the audio with the moment it closed, with sub-frame precision. This is something that has been a tedious, manual process for editors for decades in the motion picture industry and now they are able to automate it with consistent, precise results. While this is really more on the content creation side, it’s not hard to imagine how this same method could be used for automating some QA/QC processes for those on the content processing and distribution side.

Ready to try GPU encoding in the cloud?

Bitmovin VOD Encoding now supports the use of NVIDIA GPUs for accelerated video transcoding. Specifically, we use NVIDIA T4 GPUs on AWS EC2 G4dn instances, which are now available to our customers simply by using our VOD_HARDWARE_SHORTFORM preset. This enables incredibly fast turnaround times using both H.264 and H.265 codecs. For time-critical short form content like sports highlights and news clips, it can make a huge difference. You can get started today with a Bitmovin trial and see the results for yourself.

GPU Acceleration for Cloud Video Encoding

Andy Francis — Mon, 16 Oct 2023 02:08:50 +0000

Bringing Industry-leading Turnaround Times to Short Form Content

Bitmovin has recently seen an increase in demand for processing short form video content and faster turnaround times. This post will cover the most common use cases for short form video and how our new GPU acceleration for VOD encoding will benefit and add value for those workflows. Keep reading to learn more about this powerful addition to our cloud encoding.

Growing demand for short form video use cases

With the rise of FAST channels and more subscription services adding ad-supported tiers, the presence of video advertisements is growing beyond traditional AVOD workflows. Companies who have optimized their encoding workflows for longer form episodic and cinematic content are now looking for ways to add more encoding capacity and provide quicker turnaround times than they’ve needed in the past. Whether it’s from a new sponsorship deal or contract with a new ad network or sometimes adding a whole new tier to their service, ads are often dumped on the video processing team in large batches with little notice and need to be ready to serve to customers almost immediately, many times with revenue at stake.

News and Sports clips

Being the first to publish news clips or sports highlights offers a competitive advantage in the world of media and journalism, enhancing credibility and authority with not only the viewing audience, but also advertisers and sponsors. It increases social media visibility and the potential for virality, which provides free promotion and more impressions, leading to increased monetization. Timely publication also has direct SEO benefits and increases audience engagement and viewer loyalty, all of which increase organic traffic that can be monetized through other channels and grow the potential audience for future deals.

User generated content (Stories, Reels, etc)

Video is no longer limited to specialist platforms and has become “table stakes” for any social app. With competition for eyeballs being tougher than ever, a smooth, seamless experience is key for keeping users engaged and delighted. Any video that gets recorded and shared needs to give the poster a feeling that it was available almost immediately, which is not a simple task.

GPU acceleration for video encoding enables faster publishing times for sports highlights and news clips

Highlights of recent performance improvements

Bitmovin’s split-and-stitch cloud architecture enables massive horizontal scale by allowing different parts of a video to be processed simultaneously and then reassembled for playback. This type of workflow works especially well with longer content, something we showcased when demonstrating the first video encoding that processed 100x faster than real-time. While it benefits longer form content, this approach adds some overhead time that becomes more noticeable when working with shorter videos. As we heard more demand for quicker turnaround of ads and short clips, earlier this year we began optimizing our workflow with this type of short form content in mind.

In April at NAB 2023, we started sharing some of the early results of this effort, demoing 20-30 second turnaround times for the entire 720p adaptive bitrate (ABR) ladder for a 30 second clip using the H.264 codec. This was a nice step forward, but we saw that with more GPU instances becoming available in the cloud, there was potential to make more significant leaps in progress by focusing our attention on the speed provided by this newly accessible hardware.

GPU acceleration testing and evaluation

The first step to begin our GPU experimentation was to select which GPU we would use for the initial evaluation. After surveying what was available in the major public clouds and doing some estimations, we settled on the Amazon EC2 G4dn instances, which are powered by NVIDIA T4 GPUs. They can deliver up to 40x better low-latency throughput than CPUs and up to 2X video transcoding capability than the previous G3 instances, so those performance gains, combined with NVIDIA’s reputation and the cost-effectiveness of the G4 instances made them the ideal choice for our trial.

NVIDIA T4 being using by Bitmovin for GPU acceleration of cloud video encoding. source: nvidia.com

After completing some successful manual tests, the next step was to integrate the GPU instances into the Bitmovin scheduling logic so we could use them like any of the other cloud CPU instances that encoding jobs would normally run on. This would allow true apples-to-apples comparisons of total turnaround times for our workflows including queueing, analysis and encoding times. Integration with our Encoding API would also allow us to begin a closed Beta trial with customers who had expressed interest in our conversations at NAB.

The initial un-optimized GPU workflow provided some improvements in encoding time over CPU and we were able to turn around a full 1080p ABR ladder in less time than our 720p NAB demo. Still, we saw opportunities to shave off even more time and had some teams dig deeper during an internal hackathon over the summer. Through their discoveries and ongoing work, we’ve been able to accelerate scheduling and instance retrieval and are now able to achieve as low as 15 second turnaround times with customer-provided videos that are up to 5 minutes long! Working with the NVIDIA GPU also allowed us to add fast turnaround support for the H.265 (HEVC) codec in addition to H.264, which would not have been possible with CPUs alone.

As the volume of online videos continues to grow exponentially, demand for solutions to efficiently search and gain insights from video continues to grow as well. T4 delivers extraordinary performance for AI video applications, with dedicated hardware transcoding engines that bring twice the decoding performance of prior-generation GPUs. T4 can decode up to 38 full-HD video streams, making it easy to integrate scalable deep learning into video pipelines to deliver innovative, smart video services.
source: https://www.nvidia.com/en-us/data-center/tesla-t4/

Value add for key use cases

So what’s the end result of all of this work and optimization? We hope faster processing and turnaround times will lead to revenue growth for our customers by enabling higher ad fill rates and reducing the time between contracts being signed and ads being served to viewers. The flexibility of our cloud-native encoding solution is also key for absorbing spikes in demand and freeing them from investing in resources that may sit idle for long periods of time.

Giving our media customers the ability to be first to the consumer with news and sports clips boosts their chance to go viral while also growing brand awareness. It will have short term benefits for delivering more ad impressions and revenue along with longer term benefits like SEO and increasing organic traffic that can justify higher CPMs and larger sponsorship packages in the future.

Social and consumer apps will be able to improve the UX of their customer workflows, making video sharing seamless and interruption free with faster processing than ever before. Bitmovin’s customers will also benefit from a seamlessly improved experience, with GPU acceleration for VOD encoding being as easy to use as their existing CPU workflows.

What’s next for Bitmovin’s GPU acceleration?

Bitmovin has always been committed to continuously improving and that still applies with our explorations into cloud GPU encoding. One area where we’ve done some preliminary testing is providing more Bitmovin API gateways in strategic geographic locations. This may help shave a few more seconds off total turnaround time for some customers, especially for more complex encoding jobs with multiple API calls.

We pride ourselves on being the cloud-agnostic encoding solution, so pending customer feedback from our beta trial period, we will look at expanding GPU-encoding capabilities to more regions and other public clouds beyond AWS. NVIDIA also has other GPU instances available in the cloud, so those may be worth exploring and benchmarking against once our initial workflows move into production mode.

Bitmovin and the University of Klagenfurt are currently collaborating on a 2 year R&D project called GAIA that is aiming to make video streaming more sustainable and help the industry reduce its carbon footprint. One of the areas currently being researched is the relative efficiency and carbon footprints of CPU vs GPU acceleration for video encoding. We hope to share data and results of the experiments soon, but in the meantime, you can check out our recently published progress report here.

Interested in cloud GPU encoding?

Bitmovin customers and free trial users can learn more about our short-form video processing in the Bitmovin dashboard. We also have a short-form content datasheet available for download here. Our beta trial period will be coming to an end soon, but if you’re interested in an early preview, get in touch with your Bitmovin representative or let us know in the comments below. For more information about video encoding in general, check out our comprehensive guide here.

The post GPU Acceleration for Cloud Video Encoding appeared first on Bitmovin.