ai – Bitmovin https://bitmovin.com Bitmovin provides adaptive streaming infrastructure for video publishers and integrators. Fastest cloud encoding and HTML5 Player. Play Video Anywhere. Tue, 20 Aug 2024 12:41:47 +0000 en-GB hourly 1 https://bitmovin.com/wp-content/uploads/2023/11/bitmovin_favicon.svg ai – Bitmovin https://bitmovin.com 32 32 ATHENA’s first 5 years of research and innovation https://bitmovin.com/blog/5-years-of-research-and-innovation/ https://bitmovin.com/blog/5-years-of-research-and-innovation/#respond Mon, 19 Aug 2024 02:17:57 +0000 https://bitmovin.com/?p=286045 Since forming in October 2019, the Christian Doppler Laboratory ATHENA at Universität Klagenfurt, run by Bitmovin co-founder Dr. Christian Timmerer, has been advancing research and innovation for adaptive bitrate (ABR) streaming technologies. Over the past five years, the lab has addressed critical challenges in video streaming from encoding and delivery to playback and end-to-end quality...

The post ATHENA’s first 5 years of research and innovation appeared first on Bitmovin.

]]>
Since forming in October 2019, the Christian Doppler Laboratory ATHENA at Universität Klagenfurt, run by Bitmovin co-founder Dr. Christian Timmerer, has been advancing research and innovation for adaptive bitrate (ABR) streaming technologies. Over the past five years, the lab has addressed critical challenges in video streaming from encoding and delivery to playback and end-to-end quality of experience. They are breaking new ground using edge computing, machine learning, neural networks and generative AI for video applications, contributing significantly to both academic knowledge and industry applications as Bitmovin’s research partner. 

In this blog, we’ll take a look at the highlights of the ATHENA lab’s work over the past five years and its impact on the future of the streaming industry.

Publications

ATHENA has made its mark with high-impact publications on the topics of multimedia, signal processing, and computer networks. Their research has been featured in prestigious journals such as IEEE Communications Surveys & Tutorials and IEEE Transactions on Multimedia. With 94 papers published or accepted by the time of the 5-year evaluation, the lab has established itself as a leader in video streaming research.

ATHENA also contributed to reproducibility in research. Their open source tools Video Complexity Analyzer and LLL-CAdViSE have already been used by Bitmovin and others in the industry. Their open, multi-codec UHD dataset enables research and development of multi-codec playback solutions for 8K video.  

ATHENA has also looked at applications of AI in video coding and streaming, something that will become more of a focus over the next two years. You can read more about ATHENA’s AI video research in this blog post

Patents

But it’s not all just theoretical research. The ATHENA lab has successfully translated its findings into practical solutions, filing 16 invention disclosures and 13 patent applications. As of publication, 6 patents have been granted:

- Bitmovin
Workflow diagram for Fast Multi-Rate Encoding using convolutional neural networks. More detail available here.

PhDs

ATHENA has also made an educational impact, successfully guiding the inaugural cohort of seven PhD students to their successful dissertation defenses, with research topics ranging from edge computing in video streaming to machine learning applications in video coding. 

There are also two postdoctoral scholars in the lab who have made significant contributions and progress.

Practical applications with Bitmovin

As Bitmovin’s academic partner, ATHENA plays a critical role in developing and enhancing technologies that can differentiate our streaming solutions. As ATHENA’s company partner, Bitmovin helps guide and test practical applications of the research, with regular check-ins for in-depth discussions about new innovations and potential technology transfers. The collaboration has resulted in several advancements over the years, including recent projects like CAdViSE and WISH ABR. 

CAdViSE

CAdViSE (Cloud based Adaptive Video Streaming Evaluation) is a framework for automated testing of media players. It allows you to test how different players and ABR configurations perform and react to fluctuations in different network parameters. Bitmovin is using CAdViSE to evaluate the performance of different custom ABR algorithms. The code is available in this github repo

WISH ABR

WISH stands for Weighted Sum model for HTTP Adaptive Streaming and it allows for customization of ABR logic for different devices and applications. WISH’s logic is based on a model that weighs bandwidth, buffer and quality costs for playing back a segment. By setting weights for the importance of those metrics, you create a custom ABR algorithm, optimized for your content and use case. You can learn more about WISH ABR in this blog post

Visual illustration of WISH ABR research and innovation from ATHENA and Bitmovin.
Decision process for WISH ABR, weighing data/bandwidth cost, buffer cost, and quality cost of each segment.

Project spinoffs

The success of ATHENA has led to three spinoff projects:. 

APOLLO

APOLLO is funded by the Austrian Research Promotion Agency FFG and is a cooperative project between Bitmovin and Alpen-Adria-Universität Klagenfurt. The main objective of APOLLO is to research and develop an intelligent video platform for HTTP adaptive streaming which provides distribution of video transcoding across large and small-scale computing environments, using AI and ML techniques for the actual video distribution.

GAIA

GAIA is also funded by the Austrian Research Promotion Agency FFG and is a cooperative project between Bitmovin and Alpen-Adria-Universität Klagenfurt. The GAIA project researches and develops a climate-friendly adaptive video streaming platform that provides complete energy awareness and accountability along the entire delivery chain. It also aims to reduce energy consumption and GHG emissions through advanced analytics and optimizations on all phases of the video delivery chain.

SPIRIT

SPIRIT (Scalable Platform for Innovations on Real-time Immersive Telepresence) is an EU Horizon Europe-funded innovation action. It brings together cutting-edge companies and universities in the field of telepresence applications with advanced and complementary expertise in extended reality (XR) and multimedia communications. SPIRIT’s mission is to create Europe’s first multisite and interconnected framework capable of supporting a wide range of application features in collaborative telepresence.

What’s next

Over the next two years, the ATHENA project will focus on advancing deep neural network and AI-driven techniques for image and video coding. This work will include making video coding more energy- and cost-efficient, exploring immersive formats like volumetric video and holography, and enhancing QoE while being mindful of energy use. Other focus areas include AI-powered, energy-efficient live video streaming and generative AI applications for adaptive streaming. 

Get in touch or let us know in the comments if you’d like to learn more about Bitmovin and ATHENA’s research and innovation, AI or sustainability related projects. 

The post ATHENA’s first 5 years of research and innovation appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/5-years-of-research-and-innovation/feed/ 0
The AI Video Research Powering a Higher Quality Future  https://bitmovin.com/blog/ai-video-research/ https://bitmovin.com/blog/ai-video-research/#comments Sun, 05 May 2024 22:06:17 +0000 https://bitmovin.com/?p=262405 *This post was originally published in June 2023. It was updated in May 2024 with more recent research publications and updates.* This post will summarize the current state of Artificial Intelligence (AI) applications for video in 2024, including recent progress and announcements. We’ll also take a closer look at AI video research and collaboration between...

The post The AI Video Research Powering a Higher Quality Future  appeared first on Bitmovin.

]]>
*This post was originally published in June 2023. It was updated in May 2024 with more recent research publications and updates.*

This post will summarize the current state of Artificial Intelligence (AI) applications for video in 2024, including recent progress and announcements. We’ll also take a closer look at AI video research and collaboration between Bitmovin and the ATHENA laboratory that has the potential to deliver huge leaps in quality improvements and bring an end to playback stalls and buffering. This includes ATHENA’s FaRes-ML, which was recently granted a US Patent. Keep reading to learn more!

AI for video at NAB 2024

At NAB 2024, the AI hype train continued gaining momentum and we saw more practical applications of AI for video than ever before. We saw various uses of AI-powered encoding optimization, Super Resolution upscaling, automatic subtitling and translations, and generative AI video descriptions and summarizations. Bitmovin also presented some new AI-powered solutions, including our Analytics Session Interpreter, which won a Best of Show award from TV Technology. It uses machine learning and large language models to generate a summary, analysis and recommendations for every viewer session. The early feedback has been positive and we’ll continue to refine and add more capabilities that will help companies better understand and improve their viewers’ experience.

Product Manager Jacob Arends, CEO Stefan Lederer and Engineer Peter Eder accepting the award for Bitmovin's AI-powered Analytics Session Interpreter, which was a product of Bitmovin's AI video research
L to R: Product Manager Jacob Arends, CEO Stefan Lederer and Engineer Peter Eder accepting the award for Bitmovin’s AI-powered Analytics Session Interpreter

Other AI highlights from NAB included Jan Ozer’s “Beyond the Hype: A Critical look at AI in Video Streaming” presentation, NETINT and Ampere’s live subtitling demo using OpenAI Whisper, and Microsoft and Mediakind sharing AI applications for media and entertainment workflows. You can find more detail about these sessions and other notable AI solutions from the exhibition floor in this post.

FaRes-ML granted US Patent

For a few years before this recent wave of interest, Bitmovin and our ATHENA project colleagues have been researching the practical applications of AI for video streaming services. It’s something we’re exploring from several angles, from boosting visual quality and upscaling older content to more intelligent video processing for adaptive bitrate (ABR) switching. One of the projects that was first published in 2021 (and covered below in this post) is Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning (FaRes-ML). We’re happy to share that FaRes-ML was recently granted a US Patent! Congrats to the authors, Christian Timmerer, Hadi Amirpour, Ekrem Çetinkaya and the late Prof. Mohammad Ghanbari, who sadly passed away earlier this year.

Recent Bitmovin and ATHENA AI Research

In this section, I’ll give a short summary of projects that were shared and published since the original publication of this blog, and link to details for anyone interested in learning more. 

Generative AI for Adaptive Video Streaming

Presented at the 2024 ACM Multimedia Systems Conference, this research proposal outlines the opportunities at the intersection of advanced AI algorithms and digital entertainment for elevating quality, increasing user interactivity and improving the overall streaming experience. Research topics that will be investigated include AI generated recommendations for user engagement and AI techniques for reducing video data transmission. You can learn more here.

DeepVCA: Deep Video Complexity Analyzer

The ATHENA lab developed and released the open-source Video Complexity Analyzer (VCA) to extract and predict video complexity faster than existing method’s like ITU-T’s Spatial Information (SI) and Temporal Information (TI). DeepVCA extends VCA using deep neural networks to accurately predict video encoding parameters, like bitrate, and the encoding time of video sequences. The spatial complexity of the current frame and previous frame are used to rapidly predict the temporal complexity of a sequence, and the results show significant improvements over unsupervised methods. You can learn more and access the source code and dataset here.

- Bitmovin
DeepVCA’s spatial and temporal complexity prediction process

DIGITWISE: Digital Twin-based Modeling of Adaptive Video Streaming Engagement

DIGITWISE leverages the concept of a digital twin, a digital replica of an actual viewer, to model user engagement based on past viewing sessions. The digital twin receives input about streaming events and utilizes supervised machine learning to predict user engagement for a given session. The system model consists of a data processing pipeline, machine learning models acting as digital twins, and a unified model to predict engagement (XGBoost). The DIGITWISE system architecture demonstrates the importance of personal user sensitivities, reducing user engagement prediction error by up to 5.8% compared to non-user-aware models. It can also be used to optimize content provisioning and delivery by identifying the features that maximize engagement, providing an average engagement increase of up to 8.6 %.You can learn more here.

System overview diagram of DIGITWISE user engagement prediction, part of ATHENA's AI video research
System overview of DIGITWISE user engagement prediction

Previous Bitmovin and ATHENA AI Research

Better quality with neural network-driven Super Resolution upscaling

The first group of ATHENA publications we’re looking at all involve the use of neural networks to drive visual quality improvements using Super Resolution upscaling techniques. 

DeepStream: Video streaming enhancements using compressed deep neural networks

Deep learning-based approaches keep getting better at enhancing and compressing video, but the quality of experience (QoE) improvements they offer are usually only available to devices with GPUs. This paper introduces DeepStream, a scalable, content-aware per-title encoding approach to support both CPU-only and GPU-available end-users. To support backward compatibility, DeepStream constructs a bitrate ladder based on any existing per-title encoding approach, with an enhancement layer for GPU-available devices. The added layer contains lightweight video super-resolution deep neural networks (DNNs) for each bitrate-resolution pair of the bitrate ladder. For GPU-available end-users, this means ~35% bitrate savings while maintaining equivalent PSNR and VMAF quality scores, while CPU-only users receive the video as usual. You can learn more here.

- Bitmovin
DeepStream system architecture

LiDeR: Lightweight video Super Resolution for mobile devices

Although DNN-based Super Resolution methods like DeepStream show huge improvements over traditional methods, their computational complexity makes it hard to use them on devices with limited power, like smartphones. Recent improvements in mobile hardware, especially GPUs, made it possible to use DNN-based techniques, but existing DNN-based Super Resolution solutions are still too complex. This paper proposes LiDeR, a lightweight video Super Resolution network specifically tailored toward mobile devices. Experimental results show that LiDeR can achieve competitive Super Resolution performance with state-of-the-art networks while improving the execution speed significantly. You can learn more here or watch the video presentation from an IEEE workshop.

- Bitmovin
Quantitative results comparing Super Resolution methods. LiDeR achieves near equivalent PSNR and SSIM quality scores while running ~3 times faster than its closest competition.

Super Resolution-based ABR for mobile devices

This paper introduces another new lightweight Super Resolution network, SR-ABR Net, that can be deployed on mobile devices to upgrade low-resolution/low-quality videos while running in real-time. It also introduces a novel ABR algorithm, WISH-SR, that leverages Super Resolution networks at the client to improve the video quality depending on the client’s context. By taking into account device properties, video characteristics, and user preferences, it can significantly boost the visual quality of the delivered content while reducing both bandwidth consumption and the number of stalling events. You can learn more here or watch the video presentation from Mile High Video.

- Bitmovin
System architecture for proposed Super Resolution based adaptive bitrate algorithm

Less buffering and higher QoE with applied machine learning

The next group of research papers involve applying machine learning at different stages of the video workflow to improve QoE for the end user.

FaRes-ML: Fast multi-resolution, multi-rate encoding

Fast multi-rate encoding approaches aim to address the challenge of encoding multiple representations from a single video by re-using information from already encoded representations. In this paper, a convolutional neural network is used to speed up both multi-rate and multi-resolution encoding for ABR streaming. Experimental results show that the proposed method for multi-rate encoding can reduce the overall encoding time by 15.08% and parallel encoding time by 41.26%. Simultaneously, the proposed method for multi-resolution encoding can reduce the encoding time by 46.27% for the overall encoding and 27.71% for the parallel encoding on average. You can learn more here.

- Bitmovin
FaRes-ML flowchart

ECAS-ML: Edge assisted adaptive bitrate switching

As video streaming traffic in mobile networks increases, utilizing edge computing support is a key way to improve the content delivery process. At an edge node, we can deploy ABR algorithms with a better understanding of network behavior and access to radio and player metrics. This project introduces ECAS-ML, Edge Assisted Adaptation Scheme for HTTP Adaptive Streaming with Machine Learning. It uses machine learning techniques to analyze radio throughput traces and balance the tradeoffs between bitrate, segment switches and stalls to deliver a higher QoE, outperforming other client-based and edge-based ABR algorithms. You can learn more here.

- Bitmovin
ECAS-ML system architecture

Challenges ahead

The road from research to practical implementation is not always quick or direct or even possible in some cases, but fortunately that’s an area where Bitmovin and ATHENA have been working together closely for several years now. Going back to our initial implementation of HEVC encoding in the cloud, we’ve had success using small trials and experiments with Bitmovin’s clients and partners to provide real-world feedback for the ATHENA team, informing the next round of research and experimentation toward creating viable, game-changing solutions. This innovation-to-product cycle is already in progress for the research mentioned above, with promising early quality and efficiency improvements.  

Many of the advancements we’re seeing in AI are the result of aggregating lots and lots of processing power, which in turn means lots of energy use. Even with processors becoming more energy efficient, the sheer volume involved in large-scale AI applications means energy consumption can be a concern, especially with increasing focus on sustainability and energy efficiency.  From that perspective, for some use cases (like Super Resolution) it will be worth considering the tradeoffs between doing server-side upscaling during the encoding process and client-side upscaling, where every viewing device will consume more power.  

Learn more

Want to learn more about Bitmovin’s AI video research and development? Check out the links below. 

Analytics Session Interpreter webinar

AI-powered video Super Resolution and Remastering

Super Resolution blog series

Super Resolution with Machine Learning webinar

Athena research

MPEG Meeting Updates 

GAIA project blogs

AI Video Glossary

Machine Learning – Machine learning is a subfield of artificial intelligence that deals with developing algorithms and models capable of learning and making predictions or decisions based on data. It involves training these algorithms on large datasets to recognize patterns and extract valuable insights. Machine learning has diverse applications, such as image and speech recognition, natural language processing, and predictive analytics.

Neural Networks – Neural networks are sophisticated algorithms designed to replicate the behavior of the human brain. They are composed of layers of artificial neurons that analyze and process data. In the context of video streaming, neural networks can be leveraged to optimize video quality, enhance compression techniques, and improve video annotation and content recommendation systems, resulting in a more immersive and personalized streaming experience for users.

Super Resolution – Super Resolution upscaling is an advanced technique used to enhance the quality and resolution of images or videos. It involves using complex algorithms and computations to analyze the available data and generate additional details. By doing this, the image or video appears sharper, clearer, and more detailed, creating a better viewing experience, especially on 4K and larger displays. 

Graphics Processing Unit (GPU) – A GPU is a specialized hardware component that focuses on handling and accelerating graphics-related computations. Unlike the central processing unit (CPU), which handles general-purpose tasks, the GPU is specifically designed for parallel processing and rendering complex graphics, such as images and videos. GPUs are widely used in various industries, including gaming, visual effects, scientific research, and artificial intelligence, due to their immense computational power.

Video Understanding – Video understanding is the ability to analyze and comprehend the information present in a video. It involves breaking down the visual content, movements, and actions within the video to make sense of what is happening.

The post The AI Video Research Powering a Higher Quality Future  appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/ai-video-research/feed/ 1
NAB Video AI Highlights https://bitmovin.com/blog/nab-video-ai/ https://bitmovin.com/blog/nab-video-ai/#comments Fri, 26 Apr 2024 18:47:09 +0000 https://bitmovin.com/?p=280235 For the past few years, AI has been one of the top buzzwords at the NAB Show. While other hot topics like “web3” seem to have peaked and faded, interest in video AI has continued to grow and this year there were more practical solutions being showcased than ever before. A personal highlight for Bitmovin...

The post NAB Video AI Highlights appeared first on Bitmovin.

]]>

For the past few years, AI has been one of the top buzzwords at the NAB Show. While other hot topics like “web3” seem to have peaked and faded, interest in video AI has continued to grow and this year there were more practical solutions being showcased than ever before. A personal highlight for Bitmovin was winning a TV Technology Best of Show award for our AI-powered Analytics session interpreter. Keep reading to learn more about other interesting and useful applications of AI that we saw at NAB 2024.

NAB Video AI Highlights: 2024

While there was some variation in implementation and features, the majority of the AI solutions I encountered at NAB fell into one of these categories:

  • Generative AI (genAI) for video creation, post-production, or summaries and descriptions
  • Automatic subtitling and captioning with multi-language translations
  • Object or event detection and indexing
  • Video quality enhancement

This summary is definitely not exhaustive, but highlights some of the things that stood out to me on the show floor and in the conference sessions. Please let us know in the comments if you saw anything else noteworthy.

Booths and Exhibits

Adobe

Adobe has been showing AI-powered editing and post-production tools as part of their creative suite for a couple years now and they seem to be continuously improving. They teased a new Firefly video model that will be coming to Premiere Pro later this year that will enable a few new Photoshop-like tools for video. Generative Extend will allow you to extend clips with AI generated frames for perfectly timed edits and the new Firefly model will also enable object removal, addition, and replacement. They’ve also implemented content credentials into the platform that will signal when generative AI was used in the creation process and which models were used, as they prepare for supporting 3rd party genAI models like OpenAI’s Sora.  

Amazon Web Services (AWS)

AWS had one of the busiest booths in the West hall and were showcasing several AI-powered solutions, including using genAI for creating personalized ads and Intel’s Video Super Resolution upscaling. But they also had the most eye-catching and fun application of AI in the South Hall, a genAI golf simulator where you could design and play your own course.

NAB Video AI application - Generative AI golf simulator by AWS
AWS GenAI-powered golf simulator

axle.ai

Axle.ai was sharing their face, object, and logo recognition technology that can index recognized objects and search for matching objects in other videos or clips. Their software also has automatic voice transcription and translation capabilities. It can run either on-premises or in the cloud and integrates with Adobe Premiere, Final Cut Pro and other editing suites. While other companies offer similar capabilities, they stood out as being particularly focused on these use cases.

BLUEDOT

BLUEDOT was showcasing a few different solutions for improving QoE in the encoding and processing stage. Their DeepField-SR video super resolution product uses a proprietary deep neural network to upscale video up to 4K resolution, leveraging FPGAs. They were also showing AI-driven perceptual quality optimized video encoding.

- Bitmovin
BLUEDOT’s AI-driven perceptual quality optimization- image source: blue-dot.io

Twelve Labs

Twelve Labs was featuring their multimodal AI for Media & Entertainment workflows, aiming to bring human-like understanding to video content. They use both video and audio information to inform object and event detection and indexing.  This enables you to easily find moments in a video, like when a certain player scores or when a product is mentioned. They also power generative text descriptions of videos and clips. Their solution seemed more flexible than others I saw and can be integrated into media asset management systems, editing software or OTT streaming workflows.

Conference Sessions and Presentations

Beyond the Hype: A Critical look at AI in Video Streaming

In this session, as the title suggests, Jan Ozer took a close look at the current state of AI applications for video streaming workflows. He conducted several interviews with executives and product leaders ahead of NAB and shared his notes and links to the full interviews. He also called out a few times that many of the companies featured, including Bitmovin, have been researching and working on AI-powered video solutions for several years now, even before the current wave of hype. He shared Bitmovin’s new Analytics session interpreter and our Super Resolution capabilities, which you can hear more about in his interview with our VP of Product, Reinhard Grandl.

Jan Ozer’s interview with Bitmovin’s Reinhard Grandl for his Beyond the Hype NAB presentation

Some other things that stood out for me included Interra Systems’ BATON Captions, which uses natural language processing to break text in a more natural, human readable way. This is a small, subtle feature that can really make a big difference in improving accessibility and the viewer experience, that I haven’t heard anyone else focus on. DeepRender also caught my attention with their claims of an AI-based video codec that will have 45% better compression than VVC by the end of 2024. That’s a really bold claim and I’ll be watching to see if they live up to the hype. Video of the session is available here, thanks to Dan Rayburn and the Streaming Summit.

Running OpenAI’s Whisper Automatic Speech Recognition on a Live Video Transcoding Server

This was a joint presentation led by NETINT’s COO Alex Liu and Ampere’s Chief Evangelist Sean Varley. They presented a practical demo of real-time live transcoding and subtitling using NETINT’s T1U Video Processing Unit (VPU) together with Ampere’s Altra Max CPU running OpenAI Whisper. The NETINT VPU is capable of creating dozens of simultaneous adaptive bitrate outputs with H.264, H.265 and AV1 codecs. The Ampere processor was being positioned as a more environmentally-friendly option for AI inference workflows, consuming less power than similarly capable GPUs. While there were some hiccups with the in-room A/V system, the live captioning demo was impressive and worked very well. Video of the session is available here, again thanks to Dan Rayburn and the Streaming Summit.

- Bitmovin
Sean Varley and Alex Liu presenting NETINT and Ampere’s Live transcoding and subtitling workflow at NAB 2024

Leveraging Azure AI for Media Production and Content Monetization Workflows

Microsoft’s Andy Beach and MediaKind’s Amit Tank led this discussion and showcase of using genAI in media and entertainment workflows. They discussed how AI can help with each part of the production and delivery workflow to boost monetization. This included things like brand detection, contextual ad placements, metadata automation, translations, captioning and personalization. One area they discussed that I hadn’t heard anyone else talk about was using AI for content localization, not just for language translation via captions and dubbing, but for compliance with local and regional norms and in some cases regulations. For example, some areas and countries may prefer or even require removal or censorship of things like alcohol and drug use or guns and excessive violence, so AI can help automate content preparation in different ways for a global audience. They also shared their own personal “most-used” AI applications, which included Microsoft’s Copilot and related AI add-ons to Teams and other Microsoft products.

- Bitmovin
Video AI use cases across the media supply chain, presented by Microsoft and MediaKind at NAB 2024

Did you see an interesting or innovative use of AI at NAB that wasn’t mentioned here? Please let us know in the comments!

The post NAB Video AI Highlights appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/nab-video-ai/feed/ 1 Integrating AI in Video Analytics: Interview with Reinhard Grandl from Bitmovin nonadult
AI-powered Video Super Resolution and Remastering https://bitmovin.com/blog/ai-video-super-resolution/ https://bitmovin.com/blog/ai-video-super-resolution/#respond Fri, 12 Apr 2024 15:18:37 +0000 https://bitmovin.com/?p=279444 AI has been the hot buzz word in tech the past couple of years and we’re starting to see more and more practical applications for video emerging from the hype, like automatic closed-captioning and language translation, automated descriptions and summaries, and AI video Super Resolution upscaling. Bitmovin has especially focused on how AI can provide...

The post AI-powered Video Super Resolution and Remastering appeared first on Bitmovin.

]]>
AI has been the hot buzz word in tech the past couple of years and we’re starting to see more and more practical applications for video emerging from the hype, like automatic closed-captioning and language translation, automated descriptions and summaries, and AI video Super Resolution upscaling. Bitmovin has especially focused on how AI can provide value for our customers, releasing our AI Analytics Session Interpreter earlier this year and we’re looking closer at several other areas of the end-to-end video workflow.

We’re very proud of how our encoder maintains the visual quality of the source files, while significantly reducing the amount of data used, but now we’re exploring how we can actually improve on the quality of the source file for older and standard definition content. Super Resolution implementations have come a long way in the past few years and have the potential to give older content new life and make it look amazing on Ultra-High Definition screens. Keep reading to learn about Bitmovin’s progress and results. 

What is video Super Resolution and how does it work? 

Super Resolution refers to the process of enhancing the quality or increasing the resolution of an image or video beyond its original resolution. The original methods of upscaling images and video involved upsampling by using mathematical functions like bilinear and bicubic interpolation to predict new data points in between sampled data points. Some techniques used multiple lower-resolution images or video frames to create a composite higher resolution image or frame. Now AI and machine learning (ML) based methods involve training deep neural networks (DNNs) with large libraries of low and high-resolution image pairs. The networks learn to map the differences between the pairs, and after enough training they are able to accurately generate a high-resolution image from a lower-resolution one. 

Bitmovin’s AI video Super Resolution exploration and testing

Super Resolution upscaling is something that Bitmovin has been investigating and testing with customers for several years now. We published a 3-part deep dive back in 2020 that goes into detail about the principles behind Super Resolution, how it can be incorporated into video workflows and the practical applications and results. We won’t fully rehash those posts here, so check them out if you’re interested in the details. But one of the conclusions we came to back then, was that Super Resolution was an especially well-suited application for machine learning techniques. This is even more true now, as GPUs have gotten exponentially more powerful over the past 4 years, while becoming more affordable and accessible as cloud resources. 

graph showing 1000x ai compute improvement in 8 years for NVIDIA GPUs that are used for AI video super resolution
Nvidia’s GPU computation capabilities over the last 8 years – source: Nvidia GTC 2024 keynote 

ATHENA Super Resolution research

Bitmovin’s ATHENA research lab partner has also been looking into various AI video Super Resolution approaches. In a proposed method called DeepStream, they demonstrated how a DNN enhancement-layer could be included with a stream to perform Super Resolution upscaling on playback devices with capable GPUs. The results showed this method could save ~35% bitrate while delivering equivalent quality. See this link for more detail. 

- Bitmovin

Other Super Resolution techniques the ATHENA team has looked at involve upscaling on mobile devices that typically can’t take advantage of DNNs due to lack of processing power and power consumption/battery concerns. Lightweight Super Resolution networks specifically tailored for mobile devices like LiDeR and SR-ABR Net have shown positive early outcomes and performance. 

AI-powered video enhancement with Bitmovin partner Pixop

Bitmovin partner Pixop specializes in AI and ML video enhancement and upscaling. They’re also cloud native and fellow members of NVIDIA’s Inception Startup Program. They offer several AI-powered services and filters including restoration, Super Resolution upscaling, denoising, deinterlacing, film grain and frame rate conversion that automate tedious processes that used to be painstaking and time consuming. We’ve found them to be very complementary to Bitmovin’s VOD Encoding and have begun trials with Bitmovin customers. 

One application we’re exploring is digital remastering of historic content. We’ve been able to take lower resolution, grainy and generally lower quality content (by today’s standards) through Pixop’s upscaling and restoration, with promising results. The encoded output was not only a higher resolution, but also the application of cropping, graining and color correction resulted in a visually more appealing result, allowing our customer to re-monetize their aged content. The image below shows a side-by-side comparison of remastered content with finer details.

- Bitmovin
Side-by-side comparison of AI remastered content

Interested in giving your older content new life with the power of AI video Super Resolution? Get in touch here.

Related Links

Blog: Super Resolution Tech Deep Dive Part 1

Blog: Super Resolution Tech Deep Dive Part 2

Blog: Super Resolution Tech Deep Dive Part 3

Blog: AI Video Research

ATHENA research lab – Super Resolution projects and publications

pixop.com

The post AI-powered Video Super Resolution and Remastering appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/ai-video-super-resolution/feed/ 0
NVIDIA GTC24: Highlights for Video Streaming Workflows https://bitmovin.com/blog/nvidia-gtc-video-streaming-highlights/ https://bitmovin.com/blog/nvidia-gtc-video-streaming-highlights/#respond Fri, 05 Apr 2024 20:38:00 +0000 https://bitmovin.com/?p=279144 NVIDIA GTC Video Streaming Workflow Highlights NVIDIA GTC (GPU-Technology Conference) is an annual conference with training and exhibition for all aspects of GPU(Graphics Processing Unit) accelerated computing. GTC 2024 was held in March with the tagline “The Conference for the Era of AI” and as expected, generative AI was a huge focus this year. There...

The post NVIDIA GTC24: Highlights for Video Streaming Workflows appeared first on Bitmovin.

]]>

NVIDIA GTC Video Streaming Workflow Highlights

NVIDIA GTC (GPU-Technology Conference) is an annual conference with training and exhibition for all aspects of GPU(Graphics Processing Unit) accelerated computing. GTC 2024 was held in March with the tagline “The Conference for the Era of AI” and as expected, generative AI was a huge focus this year. There were also several other emerging applications of AI including advanced robotics, autonomous vehicles, climate modeling and new drug discovery. 

Selfie outside NVIDIA GTC24 where I attended sessions that I summarize in this post: NVIDIA GTC Video Streaming Workflow Highlights

When GPUs were first introduced, they were mainly used for rendering graphics in video game systems. In the mid-late ‘90s, NVIDIA’s programmable GPUs opened up new possibilities for accelerated video decoding and transcoding workflows. Even though GPUs may now be more associated with powering AI solutions, they still play an important role for many video applications and there were several sessions and announcements covering the latest video-related updates at GTC24. Keep reading to learn more about the highlights. 

Video technology updates

In a session titled NVIDIA GPU Video Technologies: New Features, Improvements, and Cloud APIs, Abhijit Patait, Sr. Director of Multimedia and AI at NVIDIA, shared the latest updates and new features available for processing video with their GPUs. Some highlights that are now available in NVIDIA’s Video Codec SDK 12.2:

  • 15% quality improvement for HEVC encoding, thanks to several enhancements:
    • UHQ (Ultra-high quality) tuning info for latency-tolerant use cases
    • Increased lookahead analysis
    • Temporal filtering for noise reduction
    • Unidirectional B-frames for latency-sensitive use cases
  • Encode 8-bit content as 10-bit for higher quality (HEVC and AV1)
- Bitmovin
Comparison of HEVC encodings with equivalent quality using 18Mbps with HQ tuning, but only 10Mbps with the new UHQ tuning – source: GTC24

There were also several “Connect with Experts” sessions held where attendees could meet and ask questions of various NVIDIA subject matter experts. In the Building Efficient Video Transcoding Pipelines Enabling 8K session, they shared how multiple NVENC instances can be used in parallel for split-frame encoding to speed up 8K transcoding workflows. This topic is also covered in detail in their developer blog here

- Bitmovin
Split frame encoding with NVIDIA GPUs – source: NVIDIA developer blog

VMAF-CUDA: Faster video quality analysis

Snap and NVIDIA gave a joint presentation around a collaborative project they worked on (including participation from Netflix) to optimize and implement VMAF (Video Multi-Method Assessment Fusion) quality calculations on NVIDIA CUDA cores. CUDA (Compute Unified Device Architecture) cores are general-purpose processing units available on NVIDIA GPUs that allow for parallel processing and applications that are complementary to the dedicated GPU circuits.

- Bitmovin
NVIDIA GPU video capabilities and components – source: nvidia.com

During the talk, they explained how implementing VMAF-CUDA enabled Snap to run their video quality assessments in parallel to the transcoding being done on NVIDIA GPUs. The new method runs several times faster and more efficiently than running VMAF on CPU instances. It was so successful that Snap is now planning to transition all VMAF calculations to GPUs, even for transcoding workflows that are CPU-based. They also published the technical details in this blog post for those interested in learning more. 

- Bitmovin
VMAF calculation speed comparison, GPU vs CPU – source: NVIDIA developer blog

Netflix Vision AI workflows

In a joint presentation by Netflix and NVIDIA, Streamed Video Processing for Cloud-Scale Vision AI Services, they shared how Netflix is using computer vision and AI at scale throughout their stack. Netflix is a bit unique not only in their massive scale, but also that they are vertically integrated and have people working on every part of the chain from content creation through distribution. This opens a lot of opportunities for using AI along with the challenge of deploying solutions at scale. 

They shared examples from: 

  • Pre-production: Storyboarding, Pre-visualization
  • Post-production: QC, Compositing and visual fx, Video search
  • Promotional media: Generating multi-format artwork, posters, trailers; Synopsis 
  • Globalization/localization of content: Multi-language subtitling and dubbing

They also discussed the pros and cons of using an off-the-shelf framework like NVIDIA’s DeepStream SDK for computer vision workflows (ease of use, efficiency of set up) vs building your own modular workflow (customization, efficiency of use)  with components like CV-CUDA Operators for pre- and post-processing of images and TensorRT for deep-learning inference. 

They also went into some detail on one application of computer vision in the post-production process, where they used object detection to identify when the clapperboard appeared in footage and sync the audio with the moment it closed, with sub-frame precision. This is something that has been a tedious, manual process for editors for decades in the motion picture industry and now they are able to automate it with consistent, precise results. While this is really more on the content creation side, it’s not hard to imagine how this same method could be used for automating some QA/QC processes for those on the content processing and distribution side. 

Ready to try GPU encoding in the cloud?

Bitmovin VOD Encoding now supports the use of NVIDIA GPUs for accelerated video transcoding. Specifically, we use NVIDIA T4 GPUs on AWS EC2 G4dn instances, which are now available to our customers simply by using our VOD_HARDWARE_SHORTFORM preset. This enables incredibly fast turnaround times using both H.264 and H.265 codecs. For time-critical short form content like sports highlights and news clips, it can make a huge difference. You can get started today with a Bitmovin trial and see the results for yourself. 

Related Links

Blog: GPU Acceleration for cloud video encoding

Guide: How to create an encoding using hardware acceleration 

Data Sheet: Raise the bar for short form content 

The post NVIDIA GTC24: Highlights for Video Streaming Workflows appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/nvidia-gtc-video-streaming-highlights/feed/ 0
Supercharging Data Insights with AI for Video Analytics https://bitmovin.com/blog/analytics-ai-session-interpreter/ https://bitmovin.com/blog/analytics-ai-session-interpreter/#respond Sat, 24 Feb 2024 02:39:37 +0000 https://bitmovin.com/?p=277219 Introduction At a recent internal hackathon, two of Bitmovin’s software engineers, Myriam Gantner and Thomas Sablattnig, explored whether AI could be used to process the large volume of data captured by Bitmovin Analytics into concise summaries and recommendations. The project was a success and is now being developed into a feature that is now available...

The post Supercharging Data Insights with AI for Video Analytics appeared first on Bitmovin.

]]>
Introduction

At a recent internal hackathon, two of Bitmovin’s software engineers, Myriam Gantner and Thomas Sablattnig, explored whether AI could be used to process the large volume of data captured by Bitmovin Analytics into concise summaries and recommendations. The project was a success and is now being developed into a feature that is now available to Bitmovin customers. Keep reading to learn more about the new Analytics AI Session Interpreter. 

Background and motivation

Bitmovin Analytics allows video developers and technicians to track, monitor and analyze their video streams in real-time. It provides insights into user behavior, video player performance and much more. While it’s a valuable companion for Bitmovin’s Encoding and Player products, it can also stand alone and be used with several open source and commercial video players. It has a dedicated dashboard for visual interpretation, but can also export data for your own custom dashboards in products like Grafana or Looker Studio. 

Bitmovin Analytics collects a ton of data about the behavior and experience your customers have when watching videos, from simple metrics like play and pause duration to more technical information like video bitrate, DRM license exchange, adaptive bitrate switching and detailed logs around errors. There is a lot of information provided for both individual viewers and overall performance, so analysis can be time-consuming and sometimes overwhelming, especially if you don’t have a lot of technical or background knowledge about the entire video streaming workflow. 

What if we could use AI to process all the data and cut through the noise and make it easier to get to the source of streaming problems sooner? Could we make the insights provided more accessible to a wider audience that didn’t have technical expertise or streaming experience? Is it possible to build an analytics AI session interpreter with existing cloud resources? These are the questions we hoped to answer with this project. Keep reading to learn more about the process and results. 

Using Google’s Vertex AI + PaLM 2 models

Logos of Google's Vertex AI and PaLM 2, which were used for the Bitmovin Analytics AI Session Interpreter

Google’s Vertex AI

Vertex AI is a managed machine learning (ML) platform that leverages Google Cloud’s infrastructure and tools for building and training ML models. It has AutoML capabilities for quick and easy deployment, but also supports custom training for more advanced users and workflows. It integrates with other Google Cloud services like BigQuery and Pub/Sub for end-to-end intelligent data processing and analytics AI workflows. 

PaLM 2

PaLM 2 is Google’s next generation large language model that was released mid-2023. It excels at reasoning tasks and can decompose a complex task into simpler subtasks. It understands nuances of human language better than previous models and can perform multilingual translations, even when idioms or ambiguous and figurative language are used. 

Initial results

Once the initial workflow and connections were established, the next step was to see how smart it was out of the box. We began by feeding it the analytics data from a streaming session.

The question: “Can you please analyze this video streaming session?”

The answer:

The first event is a play event, which indicates that the user has started playing the video. 

The second event is a buffering event, which indicates that the video is buffering. 

The third event is a play event, which indicates that the video has finished playing.

Not so insightful. Obviously some more work was needed to get the results we were hoping for, so we began the process of improving and fine tuning the prompt/question and the model’s parameters.  

Crafting the “perfect” question

Diagram of the iterative process used for fine-tuning the AI prompt for Bitmovin's Analytics AI Session Interpreter. 1. Asking the question 2. Verify the result 3. Improving the question. 4. Preparing and compressing the data. Then repeating these steps as needed until the desired results were achieved.

Improving the prompt to get more insightful responses was a multi-step iterative process. We asked questions and verified the accuracy of the results, leading us toward better phrasing of questions for the best outputs. This involved more clearly defining what aspects we wanted the AI to summarize and also asking it to provide recommendations for improvements. We also provided explanations of the properties that were part of the analytics session data and added context about certain metrics, including thresholds and ideal values and ranges for specific metrics (i.e. what is considered a good video startup time). Part of this included descriptions of how these various metrics would impact the viewer experience. We also learned it was better to structure the analytics data in JSON format to make it easier for the AI to interpret.

In the end, our “perfect” question grew to ~150 lines long!

Summary, analysis, and recommend solutions

Here is an example of Analytics session data input and the more useful output we were seeing after refining the question and contextual data:

Session Data

- Bitmovin

Summarization

The user watched a live stream for 6281 seconds (1.7 hours). 

The duration of the live event is unknown. 

The video was not paused and there were no buffering events. However, there were 53 quality changes during the session. This could indicate that the network connection was unstable or that the ABR logic was not working properly. The average bitrate of the video was 2179753 bps. This is a relatively high bitrate, which could indicate that the video was of high quality.

Analysis

The video quality changed 53 times, which is a lot. This high number of quality changes in this session could be due to a number of factors. First, the network connection of the user may have been unstable, which could have caused the video to switch between different bitrates in order to maintain a smooth playback experience. Second, the ABR logic of the video player may not have been working properly, which could have also caused the video to switch between different bitrates unnecessarily. The lack of buffer events and buffer time in this session is a good sign. This means that the video did not experience any buffering issues, which would have been a major inconvenience for the user.

Recommendation

To improve the user experience, it would be helpful to investigate the cause of the quality changes. If the network connection is unstable, the user could try using a different network or connecting to a different location. If the ABR logic is set too aggressively, the user could try adjusting the settings. If the video is encoded inefficiently, the video could be re-encoded using a more efficient codec.

User-centric insights

Beyond the text analysis and recommendations, we experimented with other potential options for providing more accessible and concise summaries of viewing sessions. This included engagement, quality of service and quality of experience scores and a “5 star” system that created an aggregate of all metrics into a single rating.

JSON response with scores

{

   ‘engagement’: 7, 

   ‘qualityOfServiceScore’: 9,

   ‘qualityOfExperienceScore’: 9

}

5 star rating examples

RATING: 1

Explanation: The video session has a low rating because the buffering time is high. This could be due to a number of factors, such as a slow internet connection or a lack of available bandwidth. The session also has a high buffer event count, which indicates that the user experienced multiple buffering interruptions. This can be a very disruptive experience for users, as it can break the flow of the video and lead to frustration. Finally, the video startup time is also high, which could indicate that the video was not optimized for the user’s device. Overall, this video session has a number of issues that could negatively impact the user experience.

RATING: 5

Explanation: This session had a good video startup time and low buffering time. The quality change rate was also good. The only downside is that the user paused the video for a long time, which may have impacted their engagement with the content.

Audience-specific session analysis

Another potential option for responses is to cater the information for specific personas with a variety of technical backgrounds, objectives and motivations when it comes to analytics data. You can imagine how a software developer, operations technician, product manager and sales or finance executive might want different levels of summarization and areas of focus when it comes to playback issues and how their own goals may be impacted. Just for fun here’s an example session summary that was generated using the voice of a teenager (which may be more accurate for a teenager from the early 2000s rather than today).

“Teenager” session analysis:

“Yo, check it out. This dude watched a live stream on his phone for over an hour. That’s crazy! He must have been really into it. The video quality was pretty good, and there were no buffering issues. But then, after about 40 minutes, the video just stopped working. The dude was pissed. He tried to restart the app, but it didn’t work. He even tried to delete and reinstall the app, but that didn’t work either. So he gave up and just went to bed.

I don’t know what happened, but it’s a bummer that the video stopped working. I’m sure the dude was really enjoying it. But at least he got to watch it for a while.“

Bitmovin Analytics AI Session Interpreter available now

Since the initial hackathon project, we’ve continued fine-tuning the process while doing the background work for connecting our APIs to make the Analytics AI Session Interpreter a reality for Bitmovin Analytics customers.

We know that many companies are sensitive to having their data used by AI models, so we are ensuring:

  • Customers will need to explicitly enable this feature in their dashboard. Without that permission, no data will ever be sent to AI services.
  • No customer or user identifiable information will be sent to AI services.
  • Only encrypted data will be sent to AI services. 
  • AI services will only retain data for the time needed to generate the output.

Sign up today to get started with a free trial of Bitmovin and supercharge your data and insights with our Analytics AI Session Interpreter! 

Related resources and links

Website: Learn more about Bitmovin Analytics

Docs: How to set up Bitmovin Analytics

Guide: Using Bitmovin Analytics with Amazon IVS

Link: Google Vertex AI

Link: PaLM 2 Large Language Model

The post Supercharging Data Insights with AI for Video Analytics appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/analytics-ai-session-interpreter/feed/ 0
It’s a Wrap! NAB 2023 in Review with Stefan Lederer https://bitmovin.com/blog/bitmovin-nab-show-review/ https://bitmovin.com/blog/bitmovin-nab-show-review/#respond Mon, 08 May 2023 09:33:34 +0000 https://bitmovin.com/?p=259586 Bitmovin joined the centennial celebration of the NAB Show 2023 in Las Vegas and came home with the Best of Show Award for the newly launched Smart Chunking innovation. We’ve got hold of our CEO, Stefan Lederer, for a wrap-up and some insights. What were the biggest highlights for you this year? NAB is an...

The post It’s a Wrap! NAB 2023 in Review with Stefan Lederer appeared first on Bitmovin.

]]>
Bitmovin joined the centennial celebration of the NAB Show 2023 in Las Vegas and came home with the Best of Show Award for the newly launched Smart Chunking innovation. We’ve got hold of our CEO, Stefan Lederer, for a wrap-up and some insights.

What were the biggest highlights for you this year?

NAB is an intense 5 days, but it’s always worth it for the satisfaction of seeing customers, partners and prospects in person. This year’s event was also super special because it was NAB’s 100th show! It made me reflect on how much the media and entertainment industry has changed and how much video has shaped the world – from the growth of broadcasting, which delivered landmark moments such as the 

first televised coronation of a monarch to the first music video to ever be aired on TV. The media and entertainment industry was further revolutionized by the growth of video streaming which paved the way for companies such as Netflix, which fast became a cultural phenomenon by pioneering the Subscription Video on Demand (SVOD) business model and still remains a tastemaker for content and technology trends today. 

100 years of NAB also led me to think about Bitmovin’s own journey at the show as we prepare to celebrate our 10th anniversary as a company on 16th June. We began exhibiting at NAB around 7 years ago with a much smaller team and booth, so it’s incredible to see how far we’ve come from those days and what we’ve built together. This year’s NAB was an outstanding show for us, with a tonne of great announcements, including our CMCD partnership with Akamai, having Nomad on our booth and also taking home an NAB Best of Show Award for Smart Chunking!

AI has gained real momentum as a trend across the technology industry. What are your thoughts on that and did you see any exciting AI innovations at NAB?

A few years ago AI was a huge topic in the media and entertainment industry and the buzz for it has definitely returned because of major advancements in AI/ML with ChatGPT (which a lot of Bitmovers have had a lot of fun with, including using it to make a rap song about video codecs!) and generative AI. Although some might use it as a buzzword, it is great to see some practical applications of AI and Machine Learning in video streaming, such as  in our Video Complexity Analyzer and Super Resolution upscaling. With the ongoing interest in the industry, I am curious and excited to see what new solutions emerge, especially when it comes to improving efficiency and sustainability. 

Sustainability has also been pretty high on the news agenda, what are your thoughts on that?

It is encouraging to see the industry embracing more and more sustainability as a core value, and I think it’s great the industry’s first-ever Excellence in Sustainability Awards were presented at the show. Becoming more sustainable is something every company should be striving for, and as an industry,  we need to do better on a corporate and technical level. At Bitmovin, we are implementing new policies and guidelines to help us reduce our carbon footprint. We’ve also partnered with the University of Klagenfurt on a research project called GAIA, which has the ultimate goal of creating a climate-friendly video streaming platform that provides complete  energy awareness and accountability, including energy consumption and GHG emissions, along the entire delivery chain.

I did also hear some cool discussions related to sustainability at the show, including the prominent advantages of cloud-native solutions over the “lift-and-shift” or “rehosting” approach, which delivers promising results in terms of performance and carbon footprint reduction.

What were some of the other biggest trends you saw at NAB?

AI and sustainability dominated discussions at NAB Show, there were also other big topics that are just as important. The first is ATSC 3.0 and 5G local networks, which represent state-of-the-art and the latest evolution within traditional broadcast television and telecommunications workflows. These technologies are enabling new possibilities for efficient content distribution, interactivity, and remote production. ATSC 3.0 had a huge presence in the convention hall this year, and there were a number of demonstrations around using private 5G pop-up networks for live contribution feeds, which is something that some of our partners in the Bitmovin Innovator Network are starting to take advantage of. 

On the other hand, WebRTC is a technology that is disrupting a lot of workflows that had traditionally been in the broadcasting or telecommunications domain. It’s been around for several years now, but we’re starting to see it evolve from being an open-source, DIY-type solution into something where there are now mature products and commercial offerings using WebRTC as the underlying technology. It’s pretty incredible that any modern web browser can now power 2-way or group video communication that used to require expensive, dedicated hardware.

Now NAB is over what are your plans?

The last few weeks have been incredibly busy for the entire team as we prepared for NAB. I also ran the Paris marathon, so I am definitely going to have a short break, but not for too long! NAB is a huge event to prepare for, but there is also a lot of work that needs to be done following the show with customers, prospects and partners.

The post It’s a Wrap! NAB 2023 in Review with Stefan Lederer appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/bitmovin-nab-show-review/feed/ 0