machine learning – Bitmovin https://bitmovin.com Bitmovin provides adaptive streaming infrastructure for video publishers and integrators. Fastest cloud encoding and HTML5 Player. Play Video Anywhere. Fri, 12 Apr 2024 15:18:42 +0000 en-GB hourly 1 https://bitmovin.com/wp-content/uploads/2023/11/bitmovin_favicon.svg machine learning – Bitmovin https://bitmovin.com 32 32 AI-powered Video Super Resolution and Remastering https://bitmovin.com/blog/ai-video-super-resolution/ https://bitmovin.com/blog/ai-video-super-resolution/#respond Fri, 12 Apr 2024 15:18:37 +0000 https://bitmovin.com/?p=279444 AI has been the hot buzz word in tech the past couple of years and we’re starting to see more and more practical applications for video emerging from the hype, like automatic closed-captioning and language translation, automated descriptions and summaries, and AI video Super Resolution upscaling. Bitmovin has especially focused on how AI can provide...

The post AI-powered Video Super Resolution and Remastering appeared first on Bitmovin.

]]>
AI has been the hot buzz word in tech the past couple of years and we’re starting to see more and more practical applications for video emerging from the hype, like automatic closed-captioning and language translation, automated descriptions and summaries, and AI video Super Resolution upscaling. Bitmovin has especially focused on how AI can provide value for our customers, releasing our AI Analytics Session Interpreter earlier this year and we’re looking closer at several other areas of the end-to-end video workflow.

We’re very proud of how our encoder maintains the visual quality of the source files, while significantly reducing the amount of data used, but now we’re exploring how we can actually improve on the quality of the source file for older and standard definition content. Super Resolution implementations have come a long way in the past few years and have the potential to give older content new life and make it look amazing on Ultra-High Definition screens. Keep reading to learn about Bitmovin’s progress and results. 

What is video Super Resolution and how does it work? 

Super Resolution refers to the process of enhancing the quality or increasing the resolution of an image or video beyond its original resolution. The original methods of upscaling images and video involved upsampling by using mathematical functions like bilinear and bicubic interpolation to predict new data points in between sampled data points. Some techniques used multiple lower-resolution images or video frames to create a composite higher resolution image or frame. Now AI and machine learning (ML) based methods involve training deep neural networks (DNNs) with large libraries of low and high-resolution image pairs. The networks learn to map the differences between the pairs, and after enough training they are able to accurately generate a high-resolution image from a lower-resolution one. 

Bitmovin’s AI video Super Resolution exploration and testing

Super Resolution upscaling is something that Bitmovin has been investigating and testing with customers for several years now. We published a 3-part deep dive back in 2020 that goes into detail about the principles behind Super Resolution, how it can be incorporated into video workflows and the practical applications and results. We won’t fully rehash those posts here, so check them out if you’re interested in the details. But one of the conclusions we came to back then, was that Super Resolution was an especially well-suited application for machine learning techniques. This is even more true now, as GPUs have gotten exponentially more powerful over the past 4 years, while becoming more affordable and accessible as cloud resources. 

graph showing 1000x ai compute improvement in 8 years for NVIDIA GPUs that are used for AI video super resolution
Nvidia’s GPU computation capabilities over the last 8 years – source: Nvidia GTC 2024 keynote 

ATHENA Super Resolution research

Bitmovin’s ATHENA research lab partner has also been looking into various AI video Super Resolution approaches. In a proposed method called DeepStream, they demonstrated how a DNN enhancement-layer could be included with a stream to perform Super Resolution upscaling on playback devices with capable GPUs. The results showed this method could save ~35% bitrate while delivering equivalent quality. See this link for more detail. 

- Bitmovin

Other Super Resolution techniques the ATHENA team has looked at involve upscaling on mobile devices that typically can’t take advantage of DNNs due to lack of processing power and power consumption/battery concerns. Lightweight Super Resolution networks specifically tailored for mobile devices like LiDeR and SR-ABR Net have shown positive early outcomes and performance. 

AI-powered video enhancement with Bitmovin partner Pixop

Bitmovin partner Pixop specializes in AI and ML video enhancement and upscaling. They’re also cloud native and fellow members of NVIDIA’s Inception Startup Program. They offer several AI-powered services and filters including restoration, Super Resolution upscaling, denoising, deinterlacing, film grain and frame rate conversion that automate tedious processes that used to be painstaking and time consuming. We’ve found them to be very complementary to Bitmovin’s VOD Encoding and have begun trials with Bitmovin customers. 

One application we’re exploring is digital remastering of historic content. We’ve been able to take lower resolution, grainy and generally lower quality content (by today’s standards) through Pixop’s upscaling and restoration, with promising results. The encoded output was not only a higher resolution, but also the application of cropping, graining and color correction resulted in a visually more appealing result, allowing our customer to re-monetize their aged content. The image below shows a side-by-side comparison of remastered content with finer details.

- Bitmovin
Side-by-side comparison of AI remastered content

Interested in giving your older content new life with the power of AI video Super Resolution? Get in touch here.

Related Links

Blog: Super Resolution Tech Deep Dive Part 1

Blog: Super Resolution Tech Deep Dive Part 2

Blog: Super Resolution Tech Deep Dive Part 3

Blog: AI Video Research

ATHENA research lab – Super Resolution projects and publications

pixop.com

The post AI-powered Video Super Resolution and Remastering appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/ai-video-super-resolution/feed/ 0
144th MPEG Meeting Takeaways: Understanding Quality Impacts of Learning-based Codecs and Enhancing Green Metadata https://bitmovin.com/blog/144th-mpeg-meeting-takeaways/ https://bitmovin.com/blog/144th-mpeg-meeting-takeaways/#respond Sun, 07 Jan 2024 21:01:46 +0000 https://bitmovin.com/?p=274472 Preface Bitmovin has been “Shaping the Future of Video” for over 10 years now and in addition to our own innovations, we’ve been actively taking part in standardization activities to improve the quality of video technologies for the wider industry. I have been a member and attendant of the Moving Pictures Experts Group for 15+...

The post 144th MPEG Meeting Takeaways: Understanding Quality Impacts of Learning-based Codecs and Enhancing Green Metadata appeared first on Bitmovin.

]]>

Table of Contents

Preface

Bitmovin has been “Shaping the Future of Video” for over 10 years now and in addition to our own innovations, we’ve been actively taking part in standardization activities to improve the quality of video technologies for the wider industry. I have been a member and attendant of the Moving Pictures Experts Group for 15+ years and have been documenting the progress since early 2010. Recently, we’ve been working on several new initiatives including the use of learning-based codecs and enhancing support for more energy-efficient media consumption.

The 144th MPEG meeting highlights

The 144th MPEG meeting was held in Hannover, Germany! For those interested, the press release with all the details is available. It’s always great to see and hear about progress being made in person.

Attendees of the 144th MPEG meeting in Hannover, Germany.
Attendees of the 144th MPEG meeting in Hannover, Germany.

The main outcome of this meeting is as follows:

  • MPEG issues Call for Learning-Based Video Codecs for Study of Quality Assessment
  • MPEG evaluates Call for Proposals on Feature Compression for Video Coding for Machines
  • MPEG progresses ISOBMFF-related Standards for the Carriage of Network Abstraction Layer Video Data
  • MPEG enhances the Support of Energy-Efficient Media Consumption
  • MPEG ratifies the Support of Temporal Scalability for Geometry-based Point Cloud Compression
  • MPEG reaches the First Milestone for the Interchange of 3D Graphics Formats
  • MPEG announces Completion of Coding of Genomic Annotations

This post will focus on MPEG Systems-related standards and visual quality assessment. As usual, the column will end with an update on MPEG-DASH.

Visual Quality Assessment

MPEG does not create standards in the visual quality assessment domain. However, it conducts visual quality assessments for its standards during various stages of the standardization process. For instance, it evaluates responses to call for proposals, conducts verification tests of its final standards, and so on. 

MPEG Visual Quality Assessment (AG 5) issued an open call to study quality assessment for learning-based video codecs. AG 5 has been conducting subjective quality evaluations for coded video content and studying their correlation with objective quality metrics. Most of these studies have focused on the High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC) standards. To facilitate the study of visual quality, MPEG maintains the Compressed Video for the study of Quality Metrics (CVQM) dataset.

With the recent advancements in learning-based video compression algorithms, MPEG is now studying compression using these codecs. It is expected that reconstructed videos compressed using learning-based codecs will have different types of distortion compared to those induced by traditional block-based motion-compensated video coding designs. To gain a deeper understanding of these distortions and their impact on visual quality, MPEG has issued a public call related to learning-based video codecs. MPEG is open to inputs in response to the call and will invite responses that meet the call’s requirements to submit compressed bitstreams for further study of their subjective quality and potential inclusion into the CVQM dataset.

Considering the rapid advancements in the development of learning-based video compression algorithms, MPEG will keep this call open and anticipates future updates to the call.

Interested parties are kindly requested to contact the MPEG AG 5 Convenor Mathias Wien (wien@lfb.rwth- aachen.de) and submit responses for review at the 145th MPEG meeting in January 2024. Further details are given in the call, issued as AG 5 document N 104 and available from the mpeg.org website.

Learning-based data compression (e.g., for image, audio, video content) is a hot research topic. Research on this topic relies on datasets offering a set of common test sequences, sometimes also common test conditions, that are publicly available and allow for comparison across different schemes. MPEG’s Compressed Video for the study of Quality Metrics (CVQM) dataset is such a dataset, available here, and ready to be used also by researchers and scientists outside of MPEG. The call mentioned above is open for everyone inside/outside of MPEG and allows researchers to participate in international standards efforts (note: to attend meetings, one must become a delegate of a national body).

Bitmovin and the ATHENA research lab have been working together on ML-based enhancements to boost visual quality and improve QoE. You can read more about our published research in this blog post

- Bitmovin

At the 144th MPEG meeting, MPEG Systems (WG 3) produced three news-worthy items as follows:

  • Progression of ISOBMFF-related standards for the carriage of Network Abstraction Layer (NAL) video data.
  • Enhancement of the support of energy-efficient media consumption.
  • Support of temporal scalability for geometry-based Point Cloud Compression (PPC).

ISO/IEC 14496-15, a part of the family of ISOBMFF-related standards, defines the carriage of Network Abstraction Layer (NAL) unit structured video data such as Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), Essential Video Coding (EVC), and Low Complexity Enhancement Video Coding (LCEVC). This standard has been further improved with the approval of the Final Draft Amendment (FDAM), which adds support for enhanced features such as Picture-in-Picture (PiP) use cases enabled by VVC.

In addition to the improvements made to ISO/IEC 14496-15, separately developed amendments have been consolidated in the 7th edition of the standard. This edition has been promoted to Final Draft International Standard (FDIS), marking the final milestone of the formal standard development.

Another important standard in development is the 2nd edition of ISO/IEC14496-32 (file format reference software and conformance). This standard, currently at the Committee Draft (CD) stage of development, is planned to be completed and reach the status of Final Draft International Standard (FDIS) by the beginning of 2025. This standard will be essential for industry professionals who require a reliable and standardized method of verifying the conformance of their implementation.

MPEG Systems (WG 3) also promoted ISO/IEC 23001-11 (energy-efficient media consumption (green metadata)) Amendment 1 to Final Draft Amendment (FDAM). This amendment introduces energy-efficient media consumption (green metadata) for Essential Video Coding (EVC) and defines metadata that enables a reduction in decoder power consumption. At the same time, ISO/IEC 23001-11 Amendment 2 has been promoted to the Committee Draft Amendment (CDAM) stage of development. This amendment introduces a novel way to carry metadata about display power reduction encoded as a video elementary stream interleaved with the video it describes. The amendment is expected to be completed and reach the status of Final Draft Amendment (FDAM) by the beginning of 2025.

Finally, MPEG Systems (WG 3) promoted ISO/IEC 23090-18 (carriage of geometry-based point cloud compression data) Amendment 1 to Final Draft Amendment (FDAM). This amendment enables the compression of a single elementary stream of point cloud data using ISO/IEC 23090-9 (geometry-based point cloud compression) and storing it in more than one track of ISO Base Media File Format (ISOBMFF)-based files. This enables support for applications that require multiple frame rates within a single file and introduces a track grouping mechanism to indicate multiple tracks carrying a specific temporal layer of a single elementary stream separately.

MPEG Systems usually provides standards on top of existing compression standards, enabling efficient storage and delivery of media data (among others). Researchers may use these standards (including reference software and conformance bitstreams) to conduct research in the general area of multimedia systems (cf. ACM MMSys) or, specifically on green multimedia systems (cf. ACM GMSys).

Enhancements to green metadata are welcome and necessary additions to the toolkit for everyone working on reducing the carbon footprint of video streaming workflows. Bitmovin and the GAIA project have been conducting focused research in this area for over a year now and through testing, benchmarking and developing new methods, hope to significantly improve our industry’s environmental sustainability. You can read more about our progress in this report

MPEG-DASH Updates

The current status of MPEG-DASH is shown in the figure below with only minor updates compared to the last meeting.

- Bitmovin
MPEG-DASH Status, October 2023.

In particular, the 6th edition of MPEG-DASH is scheduled for 2024 but may not include all amendments under development. An overview of existing amendments can be found in the blog post from the last meeting. Current amendments have been (slightly) updated and progressed toward completion in the upcoming meetings. The signaling of haptics in DASH has been discussed and accepted for inclusion in the Technologies under Consideration (TuC) document. The TuC document comprises candidate technologies for possible future amendments to the MPEG-DASH standard and is publicly available here.

MPEG-DASH has been heavily researched in the multimedia systems, quality, and communications research communities. Adding haptics to MPEG-DASH would provide another dimension worth considering within research, including, but not limited to, performance aspects and Quality of Experience (QoE).


The 145th MPEG meeting will be online from January 22-26, 2024. Click here for more information about MPEG meetings and their developments.


Want to learn more about the latest research from the ATHENA lab and its potential applications? check out this post summarizing the projects from the first cohort of finishing PhD candidates.


Notes and highlights from previous MPEG meetings can be found here.

The post 144th MPEG Meeting Takeaways: Understanding Quality Impacts of Learning-based Codecs and Enhancing Green Metadata appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/144th-mpeg-meeting-takeaways/feed/ 0