athena lab – Bitmovin https://bitmovin.com Bitmovin provides adaptive streaming infrastructure for video publishers and integrators. Fastest cloud encoding and HTML5 Player. Play Video Anywhere. Tue, 20 Aug 2024 12:41:47 +0000 en-GB hourly 1 https://bitmovin.com/wp-content/uploads/2023/11/bitmovin_favicon.svg athena lab – Bitmovin https://bitmovin.com 32 32 ATHENA’s first 5 years of research and innovation https://bitmovin.com/blog/5-years-of-research-and-innovation/ https://bitmovin.com/blog/5-years-of-research-and-innovation/#respond Mon, 19 Aug 2024 02:17:57 +0000 https://bitmovin.com/?p=286045 Since forming in October 2019, the Christian Doppler Laboratory ATHENA at Universität Klagenfurt, run by Bitmovin co-founder Dr. Christian Timmerer, has been advancing research and innovation for adaptive bitrate (ABR) streaming technologies. Over the past five years, the lab has addressed critical challenges in video streaming from encoding and delivery to playback and end-to-end quality...

The post ATHENA’s first 5 years of research and innovation appeared first on Bitmovin.

]]>
Since forming in October 2019, the Christian Doppler Laboratory ATHENA at Universität Klagenfurt, run by Bitmovin co-founder Dr. Christian Timmerer, has been advancing research and innovation for adaptive bitrate (ABR) streaming technologies. Over the past five years, the lab has addressed critical challenges in video streaming from encoding and delivery to playback and end-to-end quality of experience. They are breaking new ground using edge computing, machine learning, neural networks and generative AI for video applications, contributing significantly to both academic knowledge and industry applications as Bitmovin’s research partner. 

In this blog, we’ll take a look at the highlights of the ATHENA lab’s work over the past five years and its impact on the future of the streaming industry.

Publications

ATHENA has made its mark with high-impact publications on the topics of multimedia, signal processing, and computer networks. Their research has been featured in prestigious journals such as IEEE Communications Surveys & Tutorials and IEEE Transactions on Multimedia. With 94 papers published or accepted by the time of the 5-year evaluation, the lab has established itself as a leader in video streaming research.

ATHENA also contributed to reproducibility in research. Their open source tools Video Complexity Analyzer and LLL-CAdViSE have already been used by Bitmovin and others in the industry. Their open, multi-codec UHD dataset enables research and development of multi-codec playback solutions for 8K video.  

ATHENA has also looked at applications of AI in video coding and streaming, something that will become more of a focus over the next two years. You can read more about ATHENA’s AI video research in this blog post

Patents

But it’s not all just theoretical research. The ATHENA lab has successfully translated its findings into practical solutions, filing 16 invention disclosures and 13 patent applications. As of publication, 6 patents have been granted:

- Bitmovin
Workflow diagram for Fast Multi-Rate Encoding using convolutional neural networks. More detail available here.

PhDs

ATHENA has also made an educational impact, successfully guiding the inaugural cohort of seven PhD students to their successful dissertation defenses, with research topics ranging from edge computing in video streaming to machine learning applications in video coding. 

There are also two postdoctoral scholars in the lab who have made significant contributions and progress.

Practical applications with Bitmovin

As Bitmovin’s academic partner, ATHENA plays a critical role in developing and enhancing technologies that can differentiate our streaming solutions. As ATHENA’s company partner, Bitmovin helps guide and test practical applications of the research, with regular check-ins for in-depth discussions about new innovations and potential technology transfers. The collaboration has resulted in several advancements over the years, including recent projects like CAdViSE and WISH ABR. 

CAdViSE

CAdViSE (Cloud based Adaptive Video Streaming Evaluation) is a framework for automated testing of media players. It allows you to test how different players and ABR configurations perform and react to fluctuations in different network parameters. Bitmovin is using CAdViSE to evaluate the performance of different custom ABR algorithms. The code is available in this github repo

WISH ABR

WISH stands for Weighted Sum model for HTTP Adaptive Streaming and it allows for customization of ABR logic for different devices and applications. WISH’s logic is based on a model that weighs bandwidth, buffer and quality costs for playing back a segment. By setting weights for the importance of those metrics, you create a custom ABR algorithm, optimized for your content and use case. You can learn more about WISH ABR in this blog post

Visual illustration of WISH ABR research and innovation from ATHENA and Bitmovin.
Decision process for WISH ABR, weighing data/bandwidth cost, buffer cost, and quality cost of each segment.

Project spinoffs

The success of ATHENA has led to three spinoff projects:. 

APOLLO

APOLLO is funded by the Austrian Research Promotion Agency FFG and is a cooperative project between Bitmovin and Alpen-Adria-Universität Klagenfurt. The main objective of APOLLO is to research and develop an intelligent video platform for HTTP adaptive streaming which provides distribution of video transcoding across large and small-scale computing environments, using AI and ML techniques for the actual video distribution.

GAIA

GAIA is also funded by the Austrian Research Promotion Agency FFG and is a cooperative project between Bitmovin and Alpen-Adria-Universität Klagenfurt. The GAIA project researches and develops a climate-friendly adaptive video streaming platform that provides complete energy awareness and accountability along the entire delivery chain. It also aims to reduce energy consumption and GHG emissions through advanced analytics and optimizations on all phases of the video delivery chain.

SPIRIT

SPIRIT (Scalable Platform for Innovations on Real-time Immersive Telepresence) is an EU Horizon Europe-funded innovation action. It brings together cutting-edge companies and universities in the field of telepresence applications with advanced and complementary expertise in extended reality (XR) and multimedia communications. SPIRIT’s mission is to create Europe’s first multisite and interconnected framework capable of supporting a wide range of application features in collaborative telepresence.

What’s next

Over the next two years, the ATHENA project will focus on advancing deep neural network and AI-driven techniques for image and video coding. This work will include making video coding more energy- and cost-efficient, exploring immersive formats like volumetric video and holography, and enhancing QoE while being mindful of energy use. Other focus areas include AI-powered, energy-efficient live video streaming and generative AI applications for adaptive streaming. 

Get in touch or let us know in the comments if you’d like to learn more about Bitmovin and ATHENA’s research and innovation, AI or sustainability related projects. 

The post ATHENA’s first 5 years of research and innovation appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/5-years-of-research-and-innovation/feed/ 0
The AI Video Research Powering a Higher Quality Future  https://bitmovin.com/blog/ai-video-research/ https://bitmovin.com/blog/ai-video-research/#comments Sun, 05 May 2024 22:06:17 +0000 https://bitmovin.com/?p=262405 *This post was originally published in June 2023. It was updated in May 2024 with more recent research publications and updates.* This post will summarize the current state of Artificial Intelligence (AI) applications for video in 2024, including recent progress and announcements. We’ll also take a closer look at AI video research and collaboration between...

The post The AI Video Research Powering a Higher Quality Future  appeared first on Bitmovin.

]]>
*This post was originally published in June 2023. It was updated in May 2024 with more recent research publications and updates.*

This post will summarize the current state of Artificial Intelligence (AI) applications for video in 2024, including recent progress and announcements. We’ll also take a closer look at AI video research and collaboration between Bitmovin and the ATHENA laboratory that has the potential to deliver huge leaps in quality improvements and bring an end to playback stalls and buffering. This includes ATHENA’s FaRes-ML, which was recently granted a US Patent. Keep reading to learn more!

AI for video at NAB 2024

At NAB 2024, the AI hype train continued gaining momentum and we saw more practical applications of AI for video than ever before. We saw various uses of AI-powered encoding optimization, Super Resolution upscaling, automatic subtitling and translations, and generative AI video descriptions and summarizations. Bitmovin also presented some new AI-powered solutions, including our Analytics Session Interpreter, which won a Best of Show award from TV Technology. It uses machine learning and large language models to generate a summary, analysis and recommendations for every viewer session. The early feedback has been positive and we’ll continue to refine and add more capabilities that will help companies better understand and improve their viewers’ experience.

Product Manager Jacob Arends, CEO Stefan Lederer and Engineer Peter Eder accepting the award for Bitmovin's AI-powered Analytics Session Interpreter, which was a product of Bitmovin's AI video research
L to R: Product Manager Jacob Arends, CEO Stefan Lederer and Engineer Peter Eder accepting the award for Bitmovin’s AI-powered Analytics Session Interpreter

Other AI highlights from NAB included Jan Ozer’s “Beyond the Hype: A Critical look at AI in Video Streaming” presentation, NETINT and Ampere’s live subtitling demo using OpenAI Whisper, and Microsoft and Mediakind sharing AI applications for media and entertainment workflows. You can find more detail about these sessions and other notable AI solutions from the exhibition floor in this post.

FaRes-ML granted US Patent

For a few years before this recent wave of interest, Bitmovin and our ATHENA project colleagues have been researching the practical applications of AI for video streaming services. It’s something we’re exploring from several angles, from boosting visual quality and upscaling older content to more intelligent video processing for adaptive bitrate (ABR) switching. One of the projects that was first published in 2021 (and covered below in this post) is Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning (FaRes-ML). We’re happy to share that FaRes-ML was recently granted a US Patent! Congrats to the authors, Christian Timmerer, Hadi Amirpour, Ekrem Çetinkaya and the late Prof. Mohammad Ghanbari, who sadly passed away earlier this year.

Recent Bitmovin and ATHENA AI Research

In this section, I’ll give a short summary of projects that were shared and published since the original publication of this blog, and link to details for anyone interested in learning more. 

Generative AI for Adaptive Video Streaming

Presented at the 2024 ACM Multimedia Systems Conference, this research proposal outlines the opportunities at the intersection of advanced AI algorithms and digital entertainment for elevating quality, increasing user interactivity and improving the overall streaming experience. Research topics that will be investigated include AI generated recommendations for user engagement and AI techniques for reducing video data transmission. You can learn more here.

DeepVCA: Deep Video Complexity Analyzer

The ATHENA lab developed and released the open-source Video Complexity Analyzer (VCA) to extract and predict video complexity faster than existing method’s like ITU-T’s Spatial Information (SI) and Temporal Information (TI). DeepVCA extends VCA using deep neural networks to accurately predict video encoding parameters, like bitrate, and the encoding time of video sequences. The spatial complexity of the current frame and previous frame are used to rapidly predict the temporal complexity of a sequence, and the results show significant improvements over unsupervised methods. You can learn more and access the source code and dataset here.

- Bitmovin
DeepVCA’s spatial and temporal complexity prediction process

DIGITWISE: Digital Twin-based Modeling of Adaptive Video Streaming Engagement

DIGITWISE leverages the concept of a digital twin, a digital replica of an actual viewer, to model user engagement based on past viewing sessions. The digital twin receives input about streaming events and utilizes supervised machine learning to predict user engagement for a given session. The system model consists of a data processing pipeline, machine learning models acting as digital twins, and a unified model to predict engagement (XGBoost). The DIGITWISE system architecture demonstrates the importance of personal user sensitivities, reducing user engagement prediction error by up to 5.8% compared to non-user-aware models. It can also be used to optimize content provisioning and delivery by identifying the features that maximize engagement, providing an average engagement increase of up to 8.6 %.You can learn more here.

System overview diagram of DIGITWISE user engagement prediction, part of ATHENA's AI video research
System overview of DIGITWISE user engagement prediction

Previous Bitmovin and ATHENA AI Research

Better quality with neural network-driven Super Resolution upscaling

The first group of ATHENA publications we’re looking at all involve the use of neural networks to drive visual quality improvements using Super Resolution upscaling techniques. 

DeepStream: Video streaming enhancements using compressed deep neural networks

Deep learning-based approaches keep getting better at enhancing and compressing video, but the quality of experience (QoE) improvements they offer are usually only available to devices with GPUs. This paper introduces DeepStream, a scalable, content-aware per-title encoding approach to support both CPU-only and GPU-available end-users. To support backward compatibility, DeepStream constructs a bitrate ladder based on any existing per-title encoding approach, with an enhancement layer for GPU-available devices. The added layer contains lightweight video super-resolution deep neural networks (DNNs) for each bitrate-resolution pair of the bitrate ladder. For GPU-available end-users, this means ~35% bitrate savings while maintaining equivalent PSNR and VMAF quality scores, while CPU-only users receive the video as usual. You can learn more here.

- Bitmovin
DeepStream system architecture

LiDeR: Lightweight video Super Resolution for mobile devices

Although DNN-based Super Resolution methods like DeepStream show huge improvements over traditional methods, their computational complexity makes it hard to use them on devices with limited power, like smartphones. Recent improvements in mobile hardware, especially GPUs, made it possible to use DNN-based techniques, but existing DNN-based Super Resolution solutions are still too complex. This paper proposes LiDeR, a lightweight video Super Resolution network specifically tailored toward mobile devices. Experimental results show that LiDeR can achieve competitive Super Resolution performance with state-of-the-art networks while improving the execution speed significantly. You can learn more here or watch the video presentation from an IEEE workshop.

- Bitmovin
Quantitative results comparing Super Resolution methods. LiDeR achieves near equivalent PSNR and SSIM quality scores while running ~3 times faster than its closest competition.

Super Resolution-based ABR for mobile devices

This paper introduces another new lightweight Super Resolution network, SR-ABR Net, that can be deployed on mobile devices to upgrade low-resolution/low-quality videos while running in real-time. It also introduces a novel ABR algorithm, WISH-SR, that leverages Super Resolution networks at the client to improve the video quality depending on the client’s context. By taking into account device properties, video characteristics, and user preferences, it can significantly boost the visual quality of the delivered content while reducing both bandwidth consumption and the number of stalling events. You can learn more here or watch the video presentation from Mile High Video.

- Bitmovin
System architecture for proposed Super Resolution based adaptive bitrate algorithm

Less buffering and higher QoE with applied machine learning

The next group of research papers involve applying machine learning at different stages of the video workflow to improve QoE for the end user.

FaRes-ML: Fast multi-resolution, multi-rate encoding

Fast multi-rate encoding approaches aim to address the challenge of encoding multiple representations from a single video by re-using information from already encoded representations. In this paper, a convolutional neural network is used to speed up both multi-rate and multi-resolution encoding for ABR streaming. Experimental results show that the proposed method for multi-rate encoding can reduce the overall encoding time by 15.08% and parallel encoding time by 41.26%. Simultaneously, the proposed method for multi-resolution encoding can reduce the encoding time by 46.27% for the overall encoding and 27.71% for the parallel encoding on average. You can learn more here.

- Bitmovin
FaRes-ML flowchart

ECAS-ML: Edge assisted adaptive bitrate switching

As video streaming traffic in mobile networks increases, utilizing edge computing support is a key way to improve the content delivery process. At an edge node, we can deploy ABR algorithms with a better understanding of network behavior and access to radio and player metrics. This project introduces ECAS-ML, Edge Assisted Adaptation Scheme for HTTP Adaptive Streaming with Machine Learning. It uses machine learning techniques to analyze radio throughput traces and balance the tradeoffs between bitrate, segment switches and stalls to deliver a higher QoE, outperforming other client-based and edge-based ABR algorithms. You can learn more here.

- Bitmovin
ECAS-ML system architecture

Challenges ahead

The road from research to practical implementation is not always quick or direct or even possible in some cases, but fortunately that’s an area where Bitmovin and ATHENA have been working together closely for several years now. Going back to our initial implementation of HEVC encoding in the cloud, we’ve had success using small trials and experiments with Bitmovin’s clients and partners to provide real-world feedback for the ATHENA team, informing the next round of research and experimentation toward creating viable, game-changing solutions. This innovation-to-product cycle is already in progress for the research mentioned above, with promising early quality and efficiency improvements.  

Many of the advancements we’re seeing in AI are the result of aggregating lots and lots of processing power, which in turn means lots of energy use. Even with processors becoming more energy efficient, the sheer volume involved in large-scale AI applications means energy consumption can be a concern, especially with increasing focus on sustainability and energy efficiency.  From that perspective, for some use cases (like Super Resolution) it will be worth considering the tradeoffs between doing server-side upscaling during the encoding process and client-side upscaling, where every viewing device will consume more power.  

Learn more

Want to learn more about Bitmovin’s AI video research and development? Check out the links below. 

Analytics Session Interpreter webinar

AI-powered video Super Resolution and Remastering

Super Resolution blog series

Super Resolution with Machine Learning webinar

Athena research

MPEG Meeting Updates 

GAIA project blogs

AI Video Glossary

Machine Learning – Machine learning is a subfield of artificial intelligence that deals with developing algorithms and models capable of learning and making predictions or decisions based on data. It involves training these algorithms on large datasets to recognize patterns and extract valuable insights. Machine learning has diverse applications, such as image and speech recognition, natural language processing, and predictive analytics.

Neural Networks – Neural networks are sophisticated algorithms designed to replicate the behavior of the human brain. They are composed of layers of artificial neurons that analyze and process data. In the context of video streaming, neural networks can be leveraged to optimize video quality, enhance compression techniques, and improve video annotation and content recommendation systems, resulting in a more immersive and personalized streaming experience for users.

Super Resolution – Super Resolution upscaling is an advanced technique used to enhance the quality and resolution of images or videos. It involves using complex algorithms and computations to analyze the available data and generate additional details. By doing this, the image or video appears sharper, clearer, and more detailed, creating a better viewing experience, especially on 4K and larger displays. 

Graphics Processing Unit (GPU) – A GPU is a specialized hardware component that focuses on handling and accelerating graphics-related computations. Unlike the central processing unit (CPU), which handles general-purpose tasks, the GPU is specifically designed for parallel processing and rendering complex graphics, such as images and videos. GPUs are widely used in various industries, including gaming, visual effects, scientific research, and artificial intelligence, due to their immense computational power.

Video Understanding – Video understanding is the ability to analyze and comprehend the information present in a video. It involves breaking down the visual content, movements, and actions within the video to make sense of what is happening.

The post The AI Video Research Powering a Higher Quality Future  appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/ai-video-research/feed/ 1
PhD video research: From the ATHENA lab to Bitmovin products https://bitmovin.com/blog/athena-lab-video-research/ https://bitmovin.com/blog/athena-lab-video-research/#respond Fri, 10 Nov 2023 18:16:47 +0000 https://bitmovin.com/?p=272214 Introduction The story of Bitmovin began with video research and innovation back in 2012, when our co-founders Stefan Lederer and Christopher Mueller were students at Alpen-Adria-Universität (AAU) Klagenfurt. Together with their professor Dr. Christian Timmerer, the three co-founded Bitmovin in 2013, with their research providing the foundation for Bitmovin’s groundbreaking MPEG-DASH player and Per-Title Encoding....

The post PhD video research: From the ATHENA lab to Bitmovin products appeared first on Bitmovin.

]]>

Table of Contents

Introduction

The story of Bitmovin began with video research and innovation back in 2012, when our co-founders Stefan Lederer and Christopher Mueller were students at Alpen-Adria-Universität (AAU) Klagenfurt. Together with their professor Dr. Christian Timmerer, the three co-founded Bitmovin in 2013, with their research providing the foundation for Bitmovin’s groundbreaking MPEG-DASH player and Per-Title Encoding. Five years later in 2018, a joint project between Bitmovin and AAU called ATHENA was formed, with a new laboratory and research program that would be led by Dr. Timmerer. The aim of ATHENA was to research and develop new approaches, tools and evaluations for all areas of HTTP adaptive streaming, including encoding, delivery, playback and end-to-end quality of experience (QoE). Bitmovin could then take advantage of the knowledge gained to further innovate and enhance its products and services. In the late spring and summer of 2023, the first cohort of ATHENA PhD students completed their projects and successfully defended their dissertations. This post will highlight their work and its potential applications. 

Bitmovin co-founders Stefan Lederer, Christopher Mueller, and Christian Timmerer celebrating the opening of the Christian Doppler ATHENA Laboratory for video research, holding a sign/plaque for the Lab's entrance together with Martin Gerzabek and Ulrike Unterer from the Christian Doppler Research Association.
Bitmovin co-founders Stefan Lederer, Christopher Mueller, and Christian Timmerer celebrating the opening of the Christian Doppler ATHENA Laboratory with Martin Gerzabek and Ulrike Unterer from the Christian Doppler Research Association. (Photo: Daniel Waschnig)

Video Research Projects

Optimizing QoE and Latency of Live Video Streaming Using Edge Computing and In-Network Intelligence

Dr. Alireza Erfanian

The work of Dr. Erfanian focused on leveraging edge computing and in-network intelligence to enhance the QoE and reduce end-to-end latency in live ABR streaming. The research also addresses improving transcoding performance and optimizing costs associated with running live streaming services and network backhaul utilization. 

  1. Optimizing resource utilization – Two new methods ORAVA and OSCAR, utilize edge computing, network function virtualization, and software-defined networking (SDN). At the network’s edge, virtual reverse proxies collect clients’ requests and send them to an SDN controller, which creates a multicast tree to deliver the highest requested bitrate efficiently. This approach minimizes streaming cost and resource utilization while considering delay constraints. ORAVA, a cost-aware approach, and OSCAR, an SDN-based live video streaming method, collectively save up to 65% bandwidth compared to state-of-the-art approaches, reducing OpenFlow commands by up to 78% and 82%, respectively.
  2. Light-Weight Transcoding – These three new approaches utilize edge computing and network function virtualization to significantly improve transcoding efficiency. LwTE is a novel light-weight transcoding approach at the edge that saves time and computational resources by storing optimal results as metadata during the encoding process. It employs store and transcode policies based on popularity, caching popular segments at the edge. CD-LwTE extends LwTE by proposing Cost- and Delay-aware Light-weight Transcoding at the Edge, considering resource constraints, introducing a fetch policy, and minimizing total cost and serving delay for each segment/bitrate. LwTE-Live investigates the cost efficiency of LwTE in live streaming, leveraging the approach to save bandwidth in the backhaul network. Evaluation results demonstrate LwTE processes transcoding at least 80% faster, while CD-LwTE reduces transcoding time by up to 97%, decreases streaming costs by up to 75%, and reduces delay by up to 48% compared to state-of-the-art approaches.

Slides and more detail


Video Coding Enhancements for HTTP Adaptive Streaming using Machine Learning

Dr. Ekrem Çetinkaya

The research of Dr. Çetinkaya involved several applications of machine learning techniques for improving the video coding process across 4 categories:

  1. Fast Multi-Rate Encoding with Machine Learning – These two techniques address the challenge of encoding multiple representations of a video for ABR streaming. FaME-ML utilizes convolutional neural networks to guide encoding decisions, reducing parallel encoding time by 41%. FaRes-ML extends this approach to multi-resolution scenarios, achieving a 46% reduction in overall encoding time while preserving visual quality.
  2. Enhancing Visual Quality on Mobile Devices – These three methods focused on improving visual quality on mobile devices with limited hardware. SR-ABR integrates super-resolution into adaptive bitrate selection, saving up to 43% bandwidth. LiDeR addresses computational complexity, achieving a 428% increase in execution speed while maintaining visual quality. MoViDNN facilitates the evaluation of machine learning solutions for enhanced visual quality on mobile devices.
  3. Light-Field Image Coding with Super-Resolution – This new approach addresses the data size challenge of light field images in emerging media formats. LFC-SASR utilizes super-resolution to reduce data size by 54%, ensuring a more immersive experience while preserving visual quality.
  4. Blind Visual Quality Assessment Using Vision Transformers – A new technique, BQ-ViT, tackles the blind visual quality assessment problem for videos. Leveraging the vision transformer architecture, BQ-ViT achieves a high correlation (0.895 PCC) in predicting video visual quality using only the encoded frames.

Slides and more detail


Policy-driven Dynamic HTTP Adaptive Streaming Player Environment

Dr. Minh Nguyen

The work of Dr. Ngyuen addressed critical issues impacting QoE in adaptive bitrate (ABR) streaming, with four main contributions:

  1. Days of Future Past Plus (DoFP+) – This approach uses HTTP/3 features to enhance QoE by upgrading low-quality segments during streaming sessions, resulting in a 33% QoE improvement and a 16% reduction in downloaded data.
  2. WISH ABR – This is a weighted sum model that allows users to customize their ABR switching algorithm by specifying preferences for parameters like data usage, stall events, and video quality. WISH considers throughput, buffer, and quality costs, enhancing QoE by up to 17.6% and reducing data usage by 36.4%.
  3. WISH-SR – This is an ABR scheme that extends WISH by incorporating a lightweight Convolutional Neural Network (CNN) to improve video quality on high-end mobile devices. It can reduce downloaded data by up to 43% and enhance visual quality with client-side Super Resolution upscaling. 
  4. New CMCD Approach – This new method for determining Common Media Client Data (CMCD) parameters, enables the server to generate suitable bitrate ladders based on clients’ device types and network conditions. This approach reduces downloaded data while improving QoE by up to 2.6 times

Slides and more detail  


Multi-access Edge Computing for Adaptive Video Streaming

Dr. Jesús Aguilar Armijo

The network plays a crucial role for video streaming QoE and one of the key technologies available on the network side is Multi-access Edge Computing (MEC). It has several key characteristics: computing power, storage, proximity to the clients and access to network and player metrics, that make it possible to deploy mechanisms at the MEC node to assist video streaming.

This thesis of Dr. Aguilar Armijo investigates how MEC capabilities can be leveraged to support video streaming delivery, specifically to improve the QoE, reduce latency or increase savings on storage and bandwidth. 

  1. ANGELA Simulator – A new simulator is designed to test mechanisms supporting video streaming at the edge node. ANGELA addresses issues in state-of-the-art simulators by providing access to radio and player metrics, various multimedia content configurations, Adaptive Bitrate (ABR) algorithms at different network locations, and a range of evaluation metrics. Real 4G/5G network traces are used for radio layer simulation, offering realistic results. ANGELA demonstrates a significant simulation time reduction of 99.76% compared to the ns-3 simulator in a simple MEC mechanism scenario.
  2. Dynamic Segment Repackaging at the Edge – The proposal suggests using the Common Media Application Format (CMAF) in the network’s backhaul, performing dynamic repackaging of content at the MEC node to match clients’ requested delivery formats. This approach aims to achieve bandwidth savings in the network’s backhaul and reduce storage costs at the server and edge side. Measurements indicate potential reductions in delivery latency under certain expected conditions.
  3. Edge-Assisted Adaptation Schemes – Leveraging radio network and player metrics at the MEC node, two edge-assisted adaptation schemes are proposed. EADAS improves ABR decisions on-the-fly to enhance clients’ Quality of Experience (QoE) and fairness. ECAS-ML shifts the entire ABR algorithm logic to the edge, managing the tradeoff among bitrate, segment switches, and stalls through machine learning techniques. Evaluations show significant improvements in QoE and fairness for both schemes compared to various ABR algorithms.
  4. Segment Prefetching and Caching at the Edge – Segment prefetching, a technique transmitting future video segments closer to the client before being requested, is explored at the MEC node. Different prefetching policies, utilizing resources and techniques such as Markov prediction, machine learning, transrating, and super-resolution, are proposed and evaluated. Results indicate that machine learning-based prefetching increases average bitrate while reducing stalls and extra bandwidth consumption, offering a promising approach to enhance overall performance.

Slides and more detail


Potential applications for Bitmovin products

The WISH ABR algorithm presented by Dr. Nguyen is already available in the Bitmovin Web Player SDK as of version 8.136.0, which was released in early October 2023. It can be enabled via AdaptationConfig.logic. Use of CMCD metadata is still gaining momentum throughout the industry, but Bitmovin and Akamai have already demonstrated a joint solution and the research above will help improve our implementation.

Bitmovin has experimented with server-side Super Resolution upscaling with some customers, mainly focusing on upscaling SD content to HD for viewing on TVs and larger monitors, but the techniques investigated by Dr. Çetinkaya take advantage of newer models that can extend Super Resolution to the client side on mobile devices. These have the potential to reduce data usage which is especially important to users with limited data plans and bandwidth. They can also improve QoE and visual quality while saving service providers on delivery costs. 

Controlling costs has been at or near the top of the list of challenges video developers and streaming service providers have faced over the past couple of years according to Bitmovin’s annual Video Developer Report. This trend will likely continue into 2024 and the resource management and transcoding efficiency improvements developed by Dr. Erfanian will help optimize and reduce operational costs for Bitmovin and its services. 

Edge computing is becoming more mainstream, with companies like Bitmovin partners Videon and Edgio delivering new applications that take advantage of available compute resources closer to the end user. The contributions developed by Dr. Aguilar Armijo address different facets of content delivery and provide a comprehensive approach to optimizing video streaming in edge computing environments. This has the potential to provide more actionable analytics data and enable more intelligent and robust adaptation during challenging network conditions. 

Conclusion

Bitmovin was born from research and innovation and 10 years later is still breaking new ground. We were honored to receive a Technology & Engineering Emmy Award for our efforts and remain committed to improving every part of the streaming experience. Whether it’s taking advantage of the latest machine learning capabilities or developing novel approaches for controlling costs, we’re excited for what the future holds. We’re also grateful for all of the researchers, engineers, technology partners and customers who have contributed along the way and look forward to the next 10 years of progress and innovation.

The post PhD video research: From the ATHENA lab to Bitmovin products appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/athena-lab-video-research/feed/ 0 Optimizing QoE and Latency of Video Streaming using Edge Computing and In-Network Intelligence nonadult
GAIA Research Project : A 3 Month Look Back https://bitmovin.com/blog/gaia-3-month-update/ https://bitmovin.com/blog/gaia-3-month-update/#respond Tue, 14 Feb 2023 11:03:00 +0000 https://bitmovin.com/?p=251690 A few months ago, Bitmovin and the University of Klagenfurt announced a new collaboration on a research project called GAIA, which aims to make video streaming more sustainable. Project ‘GAIA’ is co-funded by the Austrian Research Promotion Agency (FFG) and will focus on helping the video-streaming industry reduce its carbon footprint.  Dr Christian Timmerer is...

The post GAIA Research Project : A 3 Month Look Back appeared first on Bitmovin.

]]>
A few months ago, Bitmovin and the University of Klagenfurt announced a new collaboration on a research project called GAIA, which aims to make video streaming more sustainable. Project ‘GAIA’ is co-funded by the Austrian Research Promotion Agency (FFG) and will focus on helping the video-streaming industry reduce its carbon footprint. 

Dr Christian Timmerer is a Professor at the Institute of Information Technology (ITEC) at the University of Klagenfurt and one of the co-founders of Bitmovin. We had a quick chat with him to see how Project GAIA is progressing.

For those who aren’t aware, why is it important to reduce video streaming’s carbon footprint?

Climate change is the biggest threat facing this generation, requiring urgent action. We are already seeing the impact of climate change around the world, with record-breaking temperatures and more natural disasters. Everyone has to work together in the coming years if we are to turn the tide against it, including everyone working in the video streaming industry. 

Currently, internet data traffic is responsible for more than half of digital technology’s global impact, which is 55% of energy consumption annually. Video processing and streaming generate 306 million tons of CO2, which is 20% of digital technology’s total GHG emissions and nearly 1% of worldwide GHG emissions. It’s why Bitmovin and the University of Klagenfurt are working together on Project GAIA to enable more climate-friendly video streaming solutions providing better energy awareness and efficiency through the end-to-end video workflow. 

Combining Bitmovin’s history of innovation in video streaming with the University of Klagenfurt’s strong academic background in technology research means we are well-primed to help make video streaming more sustainable.

We are now three months into Project GAIA. What have been some of the key learnings?

Over the last three months, we have been deeply focused on investigating the challenges and opportunities associated with reducing emissions in video streaming. One thing we have been examining is data centres, which handle the video encoding process and storage of video content. They consume huge amounts of power, but there are ways to make them more sustainable, including selecting energy-optimized and sustainable cloud services to help reduce CO2 emission; identifying cloud computing regions with low carbon footpring; using more efficient and faster transcoders and encoders; and optimizing the video encoding parameters to reduce the bitrates of encoded videos without affecting quality.
We have also identified challenges and opportunities in video delivery within heterogeneous networks. Ways of reducing carbon emissions centre around energy-efficient network technology for video streaming and lower data transmission to reduce energy consumption.  Lastly, we have also examined challenges and opportunities in end-user devices. Research actually shows that end-user devices and decoding hardware make up the greatest amount of energy consumption and CO2 emissions in the video delivery chain. We believe the best carbon emission reduction strategies lie in improving the energy efficiency of end users’ devices by improving screen display technologies or shifting from desktops to using more energy-efficient laptops, tablets, and smartphones.

What role will GAIA play in helping reduce video streaming’s carbon footprint?

I am incredibly excited about Project GAIA and the results it will yield. Our aim is to design a  climate-friendly adaptive video streaming platform that provides complete energy awareness and accountability, including energy consumption and GHG emissions along the entire delivery chain, from content creation and server-side encoding to video transmission and client-side rendering; and reduced energy consumption and GHG emissions through advanced analytics and optimizations on all phases of the video delivery chain.
Our research will focus on providing benchmarking, energy-aware and machine learning-based modelling, optimization algorithms, monitoring, and auto-tuning, which will provide more quantifiable data on energy consumption in video streaming through the video delivery chain. Eventually, we hope to be able to use our findings to optimize encoding, streaming and playback concerning energy consumption.

The post GAIA Research Project : A 3 Month Look Back appeared first on Bitmovin.

]]>
https://bitmovin.com/blog/gaia-3-month-update/feed/ 0
The GAIA Research Project: Creating a Climate-Friendly Video Streaming Platform https://bitmovin.com/blog/gaia-research-climate-friendly-video-streaming/ Thu, 06 Oct 2022 19:54:47 +0000 https://bitmovin.com/?p=243285 We’re excited to share that Bitmovin and the University of Klagenfurt are collaborating on a new research project with the goal of making video streaming more sustainable. Project ‘GAIA’ is co-funded by the Austrian Research Promotion Agency (FFG) and will help enable more climate-friendly video streaming solutions by providing better energy awareness and efficiency through...

The post The GAIA Research Project: Creating a Climate-Friendly Video Streaming Platform appeared first on Bitmovin.

]]>
We’re excited to share that Bitmovin and the University of Klagenfurt are collaborating on a new research project with the goal of making video streaming more sustainable. Project ‘GAIA’ is co-funded by the Austrian Research Promotion Agency (FFG) and will help enable more climate-friendly video streaming solutions by providing better energy awareness and efficiency through the end-to-end video workflow.

Dr. Christian Timmerer is an Associate Professor at the Institute of Information Technology (ITEC) at the University of Klagenfurt and one of the co-founders of Bitmovin. We asked him a few questions to learn more about the goals and motivation behind the ‘GAIA’ project. 

When you co-founded Bitmovin back in 2013, was there any focus on sustainability? What changes have you seen over the last 10+ years?

Christian: With our research background, we always tried to utilize the latest technology and research results which includes our focus on video codecs. For example, our first FFG-funded project termed “AdvUHD-DASH” aimed at integrating HEVC into our video encoding workflow; later, we were among the first to showcase AV1 live streaming (2017 NAB award); and now we’re already successfully experimenting with VVC (Collaboration with Fraunhofer HHI). 

Each new generation of video codec reduces the amount of storage by approximately 50%, which contributes to sustainability goals. Over the past 10+ years, there has been a shift to focus on more efficient usage of the available resources, where in the beginning of video streaming over the internet, much was solved using massive over-provisioning. I think this is no longer the case, and people are starting to think about environmental and climate-friendly video streaming solutions in the industry.

GAIA is a two-year joint research project between Bitmovin and the University of Klagenfurt. What is the end goal, and how soon do you think there will be actionable results and recommendations?

Christian: The results of the GAIA project will (i) enable complete awareness and accountability of the energy consumption and GHG emissions and (ii) provide efficient strategies from encoding and streaming to playback and analytics that will minimize average energy consumption.

In the beginning, we will mainly focus on collecting data and benchmarking systems with regard to energy consumption that will hopefully lead to publicly available datasets useful for both industry and academia at large, like a baseline for later improvements. Later we will use those findings to optimize encoding, streaming and playback concerning energy consumption by following and repeating the traditional “design – implement – analyze” work cycles to iteratively devise and improve solutions.

Will the results of this research be exclusive to Bitmovin?

Christian: We will showcase results at the usual trade shows like NAB and IBC., while scientific results and findings will be published in renowned conferences and journals. We will try to make them publicly available as much as possible to increase the impact and adoption of these technologies within the industry and academia. 

This is the fourth time Bitmovin and the University of Klagenfurt have collaborated on a research project. What makes this one unique?

Christian: Environmental-friendliness was always implicitly addressed within the scope of previous research projects; GAIA is unique as it makes this an explicit goal, allowing to address these issues as the top priority. 

You can read more details about the GAIA research project here

Ready to make your own video workflows more eco-friendly with HEVC or AV1 encoding? Sign up for a free Bitmovin trial today!

The post The GAIA Research Project: Creating a Climate-Friendly Video Streaming Platform appeared first on Bitmovin.

]]>
ATHENA Lab: Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning (FaRes-ML) https://bitmovin.com/blog/multi-rate-encoding-fares-ml/ Wed, 14 Jul 2021 11:49:13 +0000 https://bitmovin.com/?p=179400 The heterogeneity of the devices on the Internet and the difference among the network conditions of the users make designing a video delivery tool that can adapt to all these differences while maximizing the quality of experience (QoE) for each user a tricky problem. HTTP Adaptive Streaming (HAS) is the de-facto solution for video delivery...

The post ATHENA Lab: Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning (FaRes-ML) appeared first on Bitmovin.

]]>
- Bitmovin
The heterogeneity of the devices on the Internet and the difference among the network conditions of the users make designing a video delivery tool that can adapt to all these differences while maximizing the quality of experience (QoE) for each user a tricky problem. HTTP Adaptive Streaming (HAS) is the de-facto solution for video delivery over the Internet. In HAS, multiple representations are stored for each video, with each representation having a different quality level and/or resolution. This way, HAS streaming sessions can alternate between different quality options based on the network and viewing conditions while delivering the content. However, the requirement to store multiple representations for a single video in HAS brings additional encoding challenges since the source video needs to be encoded efficiently at multiple bitrates and resolutions. Multi-Rate encoding aims to tackle this problem. 
This blog post introduces our new approach to multi-rate encoding, called FaRes-ML, Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning (FaRes-ML). But first…

What is Multi-Rate Encoding?

In multi-rate encoding, a single source video needs to be encoded at multiple bitrates and resolutions in order to provide a suitable representation for a variety of network and viewing conditions. The quality level of the encoded video is controlled by the quantization parameter (QP) in the encoder. An example multi-rate encoding scheme is given in Fig.1.

Multi-Rate Encoding workflow_animated gif
Multi-Rate Encoding workflow

This is a computationally expensive process due to the high data size of videos and the high complexity of video codecs. However, since all of these representations consist of the same content, there is a nice amount of redundancy. Multi-rate encoding approaches exploit this redundancy to speed up the encoding process.
In multi-rate encoding, a representation is chosen as the reference representation (usually the highest [1] or the lowest quality [2] representation), and its information is used to speed up the remaining dependent representations. Since block partitioning is one of the most time-consuming processes in the encoding pipeline, a majority of the multi-rate encoding approach focuses on speeding up this portion of the process.
In block partitioning, each frame is divided into smaller pieces called blocks to achieve more precise motion compensation. Smaller block sizes are used for motion intense areas while larger block sizes are used for stationary areas. 
High-Efficiency Video Coding (HEVC) standard uses a Coding Tree Unit (CTU) for block partitioning. By default, each CTU covers a 64×64 pixels-sized square region and each CTU can be divided recursively up to three times with the smallest block size being 8×8 pixels. Each split operation increases the depth level by 1 (i.e. depth 0 for 64×64 pixels and depth 3 for 8×8 pixels). An example of block partitioning for a frame is illustrated in Fig.2.

Block partioning in Multi-rate Encoding_animated gif example
Block partitioning using a CTU

Introducing the FaRes-ML

FaRes-ML uses Convolutional Neural Networks (CNNs) to predict the CTU split decision for the dependent representations. The highest quality representation from the lowest resolution is chosen as the reference representation. The reference representation is selected from the lowest resolution to speed up the parallel encoding performance since, in parallel encoding, the highest complexity representation bounds the overall encoding time. Thus choosing the reference from a low resolution can increase the parallel encoding performance. 
The encoding process in FaRes-ML consists of three main steps:

  1. The reference representation is encoded with the HEVC reference encoder. Then, the encoding information obtained is stored to be used while encoding the dependent representations. 
  2. Once the encoding information is obtained, the pixel values from the source video in corresponding resolution and the encoding information from the reference representation are fed into the CNN for the given quality level and resolution. 
  3. The output from the CNN is the split decision for the given depth level. This decision is used to speed up the encoding of the dependent representation.

The overall encoding scheme of FaRes-ML is given in Fig.3.

Fast Multi-rate encoding scheme_animated workflow
FaRes-ML Encoding Scheme Workflow

To measure the encoding performance of the FaRes-ML approach, we compared the results to the HEVC reference software (HM 16.21) and the lower bound approach [3]. FaRes-ML achieves  27.71 % time saving for the parallel encoding and 46.27% for the overall encoding while maintaining a minimal bitrate increase (2.05 %). The resulting normalized encoding time graph is given in Fig.4.

Fast Multi-Rate Encoding efficiency comparison_FaRes-ML vs Lower Bound vs HEVC_Bar Graph
Fast Multi-Rate Encoding efficiency comparison vs Lower Bound vs HEVC

Conclusion

As the quality of content resolution improves to new heights with 4K+ resolutions becoming the norm, organizations and researchers are finding new ways to improve the back-end delivery technologies to match the content to its respective device. One of the latest approaches to improving the speed of encoding is the FaRes-ML method, a machine learning-based approach that handles multiple representations in different qualities and resolutions. By applying CNNs to exploit the redundant information in the multi-rate encoding pipeline, FaRes-ML is capable of speeding up overall encodings by nearly 50% in ATHENA’s early-stage experiments with additional improvement parallel encoding methods, all while maintaining a minimal bitrate increase. 
Although the FaRes-ML method has been proven in lab environments for single and parallel encodes, its potential can be extended to cover even more encoding decisions (e.g., reference frame selection) to further improve the encoding performance in the near future. Furthermore, the extension of the proposed method for recent video codecs such as Versatile Video Coding (VVC) can be interesting due to the increased encoding complexity of recent video encoding standards, which would significantly decrease the amount of time organizations that operate a back-end workflow could implement the brand new codec.
The team at ATHENA will work closely with Bitmovin in the coming months to determine how FaRes-ML works in real-world applications. If you’re interested in learning more about the Fast Multi-Resolution and Multi-Rate Encoding approach, you can find the full study published in the IEEE Open Journal of Signal Processing journal as an open-access article. More information about the full study can be found in the following links:

If you liked this article, check out some of our other great ATHENA content at the following links:

Sources

[1] D. Schroeder, A. Ilangovan, M. Reisslein, and E. Steinbach, “Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 1, pp. 143–157, Jan. 2018.
[2] K. Goswami et al., “Adaptive multi-resolution encoding for ABR streaming,” in Proc. 25th IEEE Int. Conf. Image Process., 2018, pp. 1008–1012.
[3] H. Amirpour, E. Çetinkaya, C. Timmerer, and M. Ghanbari, “Fast multi-rate encoding for adaptive HTTP streaming,” in Proc. Data Compression Conf., 2020, pp. 358–358..

The post ATHENA Lab: Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning (FaRes-ML) appeared first on Bitmovin.

]]>
ATHENA Lab: Improving Viewer Experiences with Scalable Light Field Coding (SLFC) https://bitmovin.com/blog/scalable-light-field-coding/ Tue, 15 Jun 2021 08:58:33 +0000 https://bitmovin.com/?p=173882 Immersive Viewer Experiences with Light Field Imaging  Light field imaging is a promising technology that will provide a more immersive viewing experience. It enables some post-processing tasks like depth estimation, changing the viewport, refocusing, etc. To this end, a huge amount of data needs to be collected, processed, stored, and transmitted, which leaves the challenging...

The post ATHENA Lab: Improving Viewer Experiences with Scalable Light Field Coding (SLFC) appeared first on Bitmovin.

]]>
- Bitmovin

Immersive Viewer Experiences with Light Field Imaging 

Light field imaging is a promising technology that will provide a more immersive viewing experience. It enables some post-processing tasks like depth estimation, changing the viewport, refocusing, etc. To this end, a huge amount of data needs to be collected, processed, stored, and transmitted, which leaves the challenging task of compression and transmission of light field images [1]. Unlike conventional photography that integrates the rays from all directions into a pixel,  light field imaging collects the rays from all directions resulting in a multiview representation of the scene. An example of a multiview representation of a light field image is shown in Fig 1 below and in an interactive format here:

Scalable Light Field Coding_Multiview Representation of an Image_Multiple Images
Fig 1. Multiview representation of a light field image. (u,v) represents the view number, and (x,y) denotes pixels inside each view. [2]
Light field image coding solutions exploit the high redundancy that exists between multiview of a light field. Pseudo Video Sequence-based (PVS) solutions convert multiview of a light field into a sequence of pictures and encode pseudo videos using an advanced video encoder. This methodology leverages the increasing dependency between views and resulting in decreased redundancy inside multiple views to improve the encoding efficiency of light field compression. In other words, PSV employs a similar method of efficiency optimization as per-title encoding, wherein similar features are identified and carried over from view to view to reduce the reuse of recurring factors. However, as the technology behind PVS solutions develops further, new challenges for other important functionalities of light field coding arise; such as viewport scalability, quality scalability, viewport random access, and uniform quality distribution among viewports.
In this post, we introduce a novel light field coding solution, namely, Scalable Light field Coding (SLFC), which addresses the above-mentioned functionalities in addition to the encoding efficiency. 

Functionalities of Light field Coding

Aside from the baseline function of reducing redundancies by collecting and comparing images from multiple views, the complexity of Light field Coding is affected by four key factors:

  • Viewport scalability: Unlike conventional 2D displayed images (and media in general), light field image coding solutions require that all views are encoded, transmitted, and decoded to alleviate high dependency between views, thereby enabling more arbitrary views (such as the standard 2D central view). Contrarily,  conventional 2D displays only display a central view, which by comparison, is a significantly less immersive experience. The scalability limitation of these multi-viewports is that in order to increase the compatibility of Light field Coding solutions with capturing devices, displays, network condition, processing power, and storage capacity, viewports must be grouped into different layers [3] and so that they can be encoded, transmitted, decoded, and displayed one after another, a significantly more complex task than conventional coding. 
  • Quality scalability: To increase compatibility with the network condition and processing power, light field images can be provided in two (or more) quality levels. With the increasing available bandwidth and/or power, the quality of light field images can be improved by transmitting the remaining layers.
  • Viewport random access: To avoid decoding delay, high bandwidth requirement, and huge processing power while navigating between various viewports, random access (the number of views required to access a specific view) to the image views should be considered in light field image coding. 
  • Uniform quality distribution: To avoid facing quality fluctuation when navigating between viewports, light field image views should have similar qualities at each bitrate.

Introducing SLFC: Scalable Light Field Coding

To address the additional complexities that come with standard Light field Coding, we propose the Scalable Light Field Coding (SLFC) solution. The first function that SLFC addresses are the viewport scalability issue by dividing multiviews into seven layers and encoding them for efficiency. 

Scalable Light Field Coding_Seven Multiview Encoding Layers_Graphs
Fig 2. Seven layers of multiview encoding

In each layer, views represented by red belong to that layer, while gray views belong to the previous layers, and black views belong to the next layers. To provide compatibility with 2D displays, the first layer contains the central view.  The second layer contains the four corner views. For the remaining layers, available horizontal and vertical intermediate views are added.

Encoding the views

The process of encoding each layer is a three-step process that’s defined by the horizontal and vertical differences between each layer/view:

  • Firstly, the central view (the first layer) is independently intra-coded, primarily defined by the red central dot.
  • The second step takes the views from the second layer and is encoded independently of each other while using the central view as their reference image. 
  • The remaining layers are made of horizontal and vertical intermediate views of previously encoded views. For example in layer 3, four possible horizontal and vertical intermediate views are added. In each layer (3 to 7), two views from the previously encoded layers are used to synthesize their intermediate view. Sepconv [4] which has been designed for video interpolation is used for view synthesis. You can see an example of this process in the image below:

 

Scalable Light Field Coding_Synthesizing Encoding Layers using Speconv_Workflow
Fig 3. The most right view in layer 3 is synthesized using the top-right and bottom-right view from layer 2.

In the example above, the layers are synthesized from the top-right and bottom-right views to create the most accurate representation of the multiview approach. As a result, the synthesized view has less residual data compared to the individual top-right and bottom-right views. Therefore,  this synthesized view is added to the reference list in the video encoder as a virtual reference frame. All-in-all, four reference views are used for encoding each view in layers 3 to 7: (i) the most central view, (ii, iii) two views that are used for synthesizing the virtual reference frame, (v) and the synthesized view.

Experimental results of Applied SLFC 

Encoding efficiency: Encoding efficiency of Table light field test image [5], compared to JPEG Pleno anchor [6], WaSP [7], MuLE [8], and PSB [9] is shown in Fig. 4. BD-Rate and BD-PSNR for other test images against the best competitor (PSB) are given in Table. 1.

Scalable Light Field Coding_encoding efficiency results comparison_linear graph
Fig. 4: RD-curves for the Table test image.

 

Scalable Light Field Coding_BD Rate & BD PSNR of SLFC vs PSB_Table
SLFC vs PSB BD-Rate %

Scalability: the number of views inside each layer, and allocated bitrate to each layer at bpp=0.75 for different layers are shown in Fig. 5.

Scalable Light Field Coding_Number of Views with each Encoding layer_Bar Chart
Fig. 5: (left) the number of views inside each layer, (right) allocated bitrate to each layer at bpp=0.75 for different layers.

Random Access: The required bitrate to access each view at bpp = 0.75 for Table test image is shown in Fig. 6.

Scalable Light Field Coding_Required Bitrate for each view_Light Graph
Fig. 6: The required bitrate to access each view at bpp=0.75.

ScalabQuality Scalability: The synthesized view is considered as quality layer 1 and utilizing the synthesized view for inter-coding results in quality layer 2.
Quality Distribution: PSNR heatmap plot for Table light field images bpp = 0.005 is shown in Fig. 7.

Scalable Light Field Coding_PSNR Heatmap by Image/View_Heatmap
Fig. 7: PSNR Heatmap plot for Table light field test image at bpp = 0.005.

Conclusion

The study of Scalable Light Field Coding (SLFC) was enacted in an attempt to optimize the process of “standard” light field coding by improving the applied compression. Our methodology added multiple critical compression features, such as viewport scalability (how many views are delivered), quality scalability, random access, and uniform quality distribution (wherein there are very few differences in quality between different views). The results of our research were that the SFLC method improves the quality of experience (QoE) for multiview content by a significant margin. In the future, applying SLFC to video and image workflows will help create a more immersive and higher-quality VR/AR experience. Conceivably allowing consumers to truly feel like they are within the environment that they are simulating.
Check out our full study and more at the following link here

Sources:

[1] C. Conti, L. D. Soares, and P. Nunes, “Dense Light Field Coding: A Survey,” in IEEE Access, vol. 8, pp. 49244-49284, 2020, DOI: 10.1109/ACCESS.2020.2977767.
[2] G. Wu et al., “Light Field Image Processing: An Overview,” in IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 7, pp. 926-954, Oct. 2017, DOI: 10.1109/JSTSP.2017.2747126.
[3] Ricardo Jorge Santos Monteiro, “Scalable light field representation and coding,” 2020.
[4] S. Niklaus, L. Mai, and F. Liu, “Video Frame Interpolation via Adaptive Separable Convolution,” 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 261-270, DOI: 10.1109/ICCV.2017.37.
[5] Katrin Honauer, Ole Johannsen, Daniel Kondermann, and Bastian Goldluecke, “A dataset and evaluation methodology for depth estimation on 4D light fields,” in Computer Vision – ACCV 2016, Shang-Hong Lai, Vincent Lepetit, Ko Nishino, and Yoichi Sato, Eds., Cham, 2017, pp. 19–34, Springer International Publishing.
[6] F Pereira, C Pagliari, EAB da Silva, I Tabus, H Amirpour, M Bernardo, and A Pinheiro, “JPEG pleno light field coding common test conditions v3. 2,” Doc. ISO/IEC JTC, vol. 1.
[7] P. Astola and I. Tabus, “Wasp: Hierarchical warping, merging, and sparse prediction for light field image compression,” in The 7th European Workshop on Visual Information Processing (EUVIP), Oct 2018, pp. 435–439.
[8] M. B. de Carvalho, M. P. Pereira, G. Alves, E. A. B. da Silva, C. L. Pagliari, F. Pereira, and V. Testoni, “A 4D DCT-Based lenslet light field codec,” in 2018 25th IEEE International Conference on Image Processing (ICIP), Oct 2018, pp. 435–439.
[9] L. Li, Z. Li, B. Li, D. Liu, and H. Li, “Pseudo-Sequence-Based 2-D Hierarchical Coding Structure for Light-Field Image Compression,” in 2017 Data Compression Conference (DCC), April 2017, pp. 131–140.

The post ATHENA Lab: Improving Viewer Experiences with Scalable Light Field Coding (SLFC) appeared first on Bitmovin.

]]>
ATHENA Labs: Improving the Quality and Efficiency of Live Video Streaming with Optimizing Resource Utilization in Live Video Streaming (OSCAR) https://bitmovin.com/blog/multicast-live-video-streaming-oscar/ Thu, 08 Apr 2021 16:00:33 +0000 https://bitmovin.com/?p=164377 Live video streaming is a specific type of streaming that a video is broadcasted in real-time. The actual source of the video can be pre-recorded or simultaneously recorded. Live streaming is suitable for live venues, conferences, and gaming. In recent years the demands for watching live venues such as news, concerts, and sports have increased,...

The post ATHENA Labs: Improving the Quality and Efficiency of Live Video Streaming with Optimizing Resource Utilization in Live Video Streaming (OSCAR) appeared first on Bitmovin.

]]>
- Bitmovin
Live video streaming is a specific type of streaming that a video is broadcasted in real-time. The actual source of the video can be pre-recorded or simultaneously recorded. Live streaming is suitable for live venues, conferences, and gaming. In recent years the demands for watching live venues such as news, concerts, and sports have increased, with an additional boost in demand due to the COVID-19 pandemic. Moreover, new applications like E-learning, online gaming, worship, e-commerce, and social networks like Facebook and Instagram further increase the demand for live streaming support.  On the client side, a large number of devices and applications with different capabilities like display resolution have emerged, resulting in an increasing demand for video streaming with various characteristics such as higher resolution, high perceptual visual quality, and frame rate. To satisfy clients’ needs,  it is crucial to offer multiple customized services, such as different quality levels/resolutions of the various video representations.

How to Improve Quality of Live Video Streams

Separate representation transfers

The first and naive solution is to transfer each requested representation separately. Considering the fact that the number of representations is limited and a high number of users can watch a live video, we should transfer a quality level from the origin server to corresponding clients many times. This approach can generate redundant traffic and waste a significant amount of limited network bandwidth, resulting in the degradation of other users/services quality.

Transferring representation with multicast trees

The alternative solution is employing multicast. To employ this methodology, a service provider needs to create a multicast “tree” from the origin server to function as the “root” for each desired quality level to a corresponding client. The origin server sends a single copy of each requested representation to clients through the multicast tree. The video packets are automatically duplicated in the network elements, such as routers, switches, and cellular network base stations, whenever the multicast tree is branched. Fig.1 below shows a simple scenario where a video is delivered in different qualities to seven unique clients connected through different base stations. Each delivered quality level is depicted using a different color, is independent, and should be delivered separately. As depicted below, the duplicated quality levels are sent once along a common path. For example, see the QId-4(blue) between “S1” to “P3”.

Multicast Approach_Live Video Streaming_Workflow
Fig.1 Multicast example

The multicasting approach results in a considerable reduction in bandwidth utilization, especially in the internet’s core where the origin server is located. However, this approach still faces several challenges. First, each router has to maintain the state of a multicast group, which requires complicated operations in routers. Second, IP multicast routers do not have a global view of the network status and can hardly determine optimal multicast trees to ensure end-to-end quality of service (QoS) requirements. Finally, the multicast topology for video streaming is usually dynamic, i.e., clients can join and leave on the fly. However, current IP networks are not able to re-configure routing paths dynamically and adaptively. 

Introducing the OSCAR approach

To alleviate the current issues of classic multicasting mentioned above, the Christian Doppler Laboratory ATHENA at Alpen-Adria-Universität Klagenfurt proposes OSCAR (On Optimizing Resource Utilization in Live Video Streaming) as a new live video streaming approach. OSCAR employs two types of Virtual Network Functions (VNFs): 

  1. A set of virtual reverse proxy servers (VRPs) that are applied at the edge of the network to aggregate the clients’ requests and send them to a Software-Defined Networking (SDN) controller.
  2. A set of virtual transcoder functions (VTFs) to serve clients’ requested quality levels by transcoding them from the highest quality level. 

After gathering requests from VRPs, the controller executes an optimization model to determine a multicast tree from the origin server to an appropriate subset of VTFs. As illustrated in Fig.2, using VTF(s) enables bandwidth usage reduction by sending only the highest requested quality level (here QId-4) from the origin server to VTF(s) over a multicast tree. Since VTFs are responsible for satisfying VRPs’ requests, they produce the lower quality levels from the highest quality level and then transmit them to the VRPs in a multicast fashion. For example, as depicted in Fig. 2, QId-4 is delivered from S1 to a VTF on P6 and then transcoded to the client’s requested quality levels in P6.

Replacing Multicast Live Streaming_OSCAR Approach_Worflow
Fig.2 OSCAR approach

The OSCAR approach can be summarized into three overarching steps:

  1. In the first step, VRPs gather clients’ requests such as join, leave, and change the quality and then update the SDN controller accordingly.
  2. The SDN controller runs an optimization model to determine a multicast tree from the origin server to VRPs that pass through the VTFs.
  3. The VTFs produce the VRPs’ requested quality levels by transcoding from the highest quality level.
  4. The last step comprises applying outputs of the SDN controller to the network (e.g., setting up datapaths), running the VTFs,  and then transmitting data from the origin server to the requesting VRPs.

The new OSCAR approach ensures that more viewers can view higher quality content at lower overall bandwidth expenditure (measured at the server level) at significantly faster speeds by only delivering the specified quality representations.

Conclusion

Throughout our testing of the OSCAR approach and its algorithms, we found that using VTFs resulted in substantial savings in network bandwidth usage due to transcoding to other requested quality levels. We evaluated the performance of the OSCAR approach by comparing bandwidth usage and network path selection effort of the open-source Tears of Steel video sequence using a “superfast” encoding preset with AVC codec. In the end, our most recent OSCAR test showed a 65% and 75% reduction in bandwidth usage and path selection overhead in comparison with state-of-the-art approaches, respectively.
To view the full research, study, and analysis, download our paper on the official IEEE website published within the IEEE Transactions on Network and Service Management.
Citation: A. Erfanian, F. Tashtarian, A. Zabrovskiy, C. Timmerer, and H. Hellwagner, “OSCAR: On Optimizing Resource Utilization in Live Video Streaming,” in IEEE Transactions on Network and Service Management, vol. 18, no. 1, pp. 552-569, March 2021, DOI: 10.1109/TNSM.2021.3051950.
Learn more about the ATHENA Christian Doppler (CD) Laboratory here.
Acknowledgment: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged.
About the Authors:
This project is a collaboration between the ATHENA Lab, the Klagenfurt Alpen-Adria University, and Bitmovin

  • Christian Timmerer is an Associate Professor the Alpen-Adria University, Klagenfurt, lead researcher at ATHENA, and CIO & Head of Research at Bitmovin
  • Hermann Hellwagner is a full professor at the Alpen-Adria University, Klagenfurt and lead researcher at ATHENA
  • Alireza Erfanian is a researcher and Ph.D. student from the Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt
  • Farzad Tashtarian is a post-doctoral researcher for ATHENA
  • Anatoliy Zabrovskiy is a researcher and lecturer at the Alpen-Adria University, Klagenfurt

Video technology guides and articles

The post ATHENA Labs: Improving the Quality and Efficiency of Live Video Streaming with Optimizing Resource Utilization in Live Video Streaming (OSCAR) appeared first on Bitmovin.

]]>
Bitmovin & Austrian University to team up for innovative Video Transmission Technology research https://bitmovin.com/blog/bitmovin-aau-video-transmission-technology/ Thu, 23 Jan 2020 12:21:28 +0000 https://bitmovin.com/?p=90942 Bitmovin and the University of Klagenfurt Collaborate on Innovative Video Transmission Technology Klagenfurt, Austria / 21 January 2020 – Bitmovin, a world leader in online video technology, is teaming up with the University of Klagenfurt and the Austrian Federal Ministry of Digital and Economic Affairs (BMDW) in a multi-million Euro research project to uncover video...

The post Bitmovin & Austrian University to team up for innovative Video Transmission Technology research appeared first on Bitmovin.

]]>
Bitmovin and the University of Klagenfurt Collaborate on Innovative Video Transmission Technology

Bitmovin-Video-Transmission-Technology-Featured-Image
Klagenfurt, Austria / 21 January 2020 – Bitmovin, a world leader in online video technology, is teaming up with the University of Klagenfurt and the Austrian Federal Ministry of Digital and Economic Affairs (BMDW) in a multi-million Euro research project to uncover video transmission technology techniques that will enhance the video streaming experiences of the future.
The joint project establishes a dedicated research team to investigate potential new tools and methodologies for encoding, transport, and playback of live and on-demand video using the HTTP Adaptive Streaming protocol that is widely used by online video and TV providers. The resulting findings will help empower the creation of next-generation solutions for higher quality video experiences at lower latency, while also potentially reducing storage and distribution costs.
**This is a crossposted article originally featured here on Platform Comms and here in German**

Bitmovin-AAU-Certificate-Video-Transmission-Technology
From left to right: Bitmovin Founders, S. Lederer, C. Müller, & C. Timmerer receiving certificate of accomplishment from AAU

Margarete Schramböck, Federal Minister for Digital and Economic Affairs, sees great potential for the future in the development of technologies of this kind: “Video represents 60% of the Internet data volume and, correspondingly, the potential for optimization and resource conservation is enormous. At the same time, the Christian Doppler Laboratory contributes to the development of high-tech in Carinthia, secures jobs and draws qualified personnel to the region. A win-win situation for companies, science, and society.”

Fierce competition increases the need for innovation

C. Timmer Accepting Video Transmission Technology Certificate
Bitmovin CIO, C. Timmerer

“The partnership with the University of Klagenfurt allows us to investigate the basic building blocks of video delivery in greater detail. This will help us to remain in pole position in the years ahead”, as Christopher Müller, CTO at Bitmovin states. Christian Timmerer, Associate Professor at the Institute of Information Technology (ITEC) at the University of Klagenfurt and Laboratory Director, goes on to explain: “Increasing competition between online video providers will accelerate the need for innovation. We continuously strive to maintain the optimum balance between cost, quality of user experience and increasing complexity of content.” 

Ministry of Economic Affairs provides support through the Christian Doppler Research Association

The Christian Doppler Laboratory ATHENA is jointly funded by Bitmovin and the Christian Doppler Research Association, whose primary public sponsor is the Federal Ministry of Digital and Economic Affairs. The budget for 7 years of research is approx. 4.5 million Euros, with the public sector providing roughly 2.7 million of this total. Martin Gerzabek, President of the Christian Doppler Research Association, sees great potential for cooperation between science and industry, as in this case: “ATHENA is our first Christian Doppler Laboratory at the University of Klagenfurt. We are very pleased about the expansion of our funding model, which facilitates cooperation between outstanding science and innovative companies on an equal footing. We congratulate the University of Klagenfurt on this great success and confidently look forward to furthering CD labs and JR centers in the region.” 
According to Oliver Vitouch, Rector of the University of Klagenfurt, “ATHENA offers a fantastic opportunity for further pioneering developments in global leading-edge technologies. Video streaming has permeated our everyday lives; most of us use it on a daily basis. This lab of the future is an ideal blend of research and innovation”. In Klagenfurt, members of the Institute for Information Technology have been working on the development of innovative video transmission technology for around 20 years. Bitmovin, which operates on a global scale and maintains sites on three continents today, originally began its operations in Klagenfurt: The three founders (Stefan Lederer CEO, Christopher Müller CTO, and Christian Timmerer CIO) first collaborated on the development of the MPEG-DASH video streaming standard during their time at the University of Klagenfurt. This standard is currently used by YouTube, Netflix, ORF-TVThek, Flimmit and many more besides. 

About Bitmovin

Bitmovin was founded in 2013 by Stefan Lederer, Christopher Müller, and Christian Timmerer as a spinoff of the University of Klagenfurt, where they both worked on the standardization of MPEG-DASH, a major standard for video streaming, during their time as students. The start-up company found its first home in the neighboring Lakeside Science & Technology Park. Today, the company provides the world’s most powerful products for highly efficient video streaming on the Internet. Large, international customers such as the BBC or Hulu Japan rely on solutions developed in Carinthia.
Bitmovin Founders w/ Academics Video Transmission Technology
Since participating in the renowned Y Combinator program in the USA, the official corporate headquarters are located in San Francisco. However, the two locations in Austria remain the centers of excellence for research and development – not least due to the strong ties to the University of Klagenfurt. Over the course of two financing rounds in 2016 and 2018, the company was able to secure over 40 million dollars in venture capital from international investors.

Industry-Leading Video Technologies

Bitmovin technology innovations focus on video encoding, playback, and analytics around user experiences. Another feature is industry-leading transcode speeds, reaching 100 times real-time. The Bitmovin Player runs on the widest array of compelling consumer devices, ranging from mobile handheld devices to large screen televisions fed by dongle devices or with native smart TV capabilities – providing a rich feature set with consistent UI’s and API’s. Bitmovin’s newest analytics product provides multi-screen audience and QoS data to analyze and optimize every play in real-time.
Most recently, Bitmovin was granted up to 20 million euros by the European Investment Bank to finance research and development as well as investments in sales and marketing in the coming years. Market-oriented, forward-looking product development and research at the cutting edge earn Bitmovin awards time and again, such as the “Phoenix” start-up prize in 2016, one of the most prestigious start-up prizes in Austria, with which the Austria Wirtschaftsservice GmbH (AWS), the Austrian Research Promotion Agency (FFG) and the Federation of Austrian Industries (IV) recognize outstanding research achievements and innovative product ideas.

About Christian Doppler Laboratories

In Christian Doppler Laboratories, application-oriented basic research is carried out at a high level, which involves outstanding scientists cooperating with innovative companies. The Christian Doppler Research Association is internationally regarded as a best-practice example for the promotion of this type of cooperation. Christian Doppler Laboratories are jointly financed by the public sector and the participating companies. The primary public sponsor is the Federal Ministry of Digital and Economic Affairs (BMDW). 

About the University of Klagenfurt

Since its foundation in 1970, the University of Klagenfurt has successfully established itself as one of six state universities with a broad range of subjects in Austria. More than 11,600 students pursue their studies and research at the University of Klagenfurt; around 2000 of these are international students. Approximately 1,500 employees strive for top quality in teaching and research. According to the QS World University Rankings (“Top 50 under 50”) the university belongs to the 150 best young universities worldwide. In the Times Higher Education World University Rankings 2020, which endeavor to rank the top 1,400 universities across the globe, it placed in the 301-350 range. In the discipline of Computer Science, the University of Klagenfurt was third place among Austrian universities in the 201-250 range. One of the university’s key research strengths lies in “networked and autonomous systems”.
 
Further inquiries:
Assoc.-Prof. Dr. Christian Timmerer
+43 463 2700 3621
Christian.Timmerer@aau.at

The post Bitmovin & Austrian University to team up for innovative Video Transmission Technology research appeared first on Bitmovin.

]]>