Multimedia Communication

Hadi

Authors: Hadi Amirpour (AAU, Austria), Lingfeng Qu (Guangzhou University, China), Jong Hwan Ko (SKKU, South Korea), Cosmin Stejerean (Meta, USA), Christian Timmerer (AAU, Austria

Conference: IEEE Visual Communications and Image Processing (IEEE VCIP 2024)  – Tokyo, Japan, December 8-11, 2024

Abtract: As video dimensions — including resolution, frame rate, and bit depth — increase, a larger bitrate is required to maintain a higher Quality of Experience (QoE). While videos are often optimized for resolution and frame rate to improve compression and energy efficiency, the impact of color space is often overlooked. Larger color spaces are essential for avoiding color banding and delivering High Dynamic Range (HDR) content with richer, more accurate colors, although this comes at the cost of higher processing energy. This paper investigates the effects of bit depth and color subsampling on video compression efficiency and energy consumption. By analyzing different bit depths and subsampling schemes, we aim to determine optimized settings that balance compression efficiency with energy consumption, ultimately contributing to more sustainable and high-quality video delivery. We evaluate both encoding and decoding energy consumption and assess the quality of videos using various metrics including PSNR, VMAF, ColorVideoVDP, and CAMBI. Our findings offer valuable insights for video codec developers and content providers aiming to improve the performance and environmental footprint of their video streaming services.

Index Terms— Video encoding, video decoding, video quality, bit depth, color subsampling, energy.

Hadi

Authors: Annalisa Gallina (UNIPD, Italy), Hadi Amirpour (AAU, Austria), Sara Baldoni (UNIPD, Italy), Giuseppe Valenzise (UPSaclay, France), Federica Battisti (UNIPD, Italy).

Conference: IEEE Visual Communications and Image Processing (IEEE VCIP 2024) – Tokyo, Japan, December 8-11, 2024

Abstract: Measuring the complexity of visual content is crucial in various applications, such as selecting sources to test processing algorithms, designing subjective studies, and efficiently determining the appropriate encoding parameters and bandwidth allocation for streaming. While spatial and temporal complexity measures exist for 2D videos, a geometric complexity measure for 3D content is still lacking. In this paper, we present the first study to characterize the geometric complexity of 3D point clouds. Inspired by existing complexity measures, we propose several compression-based definitions of geometric complexity derived from the rate-distortion curves obtained by compressing a dataset of point clouds using G-PCC. Additionally, we introduce density-based and geometry-based descriptors to predict complexity. Our initial results show that even simple density measures can accurately predict the geometric complexity of point clouds.

Index Terms— Point cloud, complexity, compression, G-PCC.

Authors: Prajit T Rajendran (Universite Paris-Saclay), Samira Afzal (Alpen-Adria-Universität Klagenfurt), Vignesh V Menon (Fraunhofer HHI), Christian Timmerer (Alpen-Adria-Universität Klagenfurt)

Conference: IEEE Visual Communications and Image Processing (IEEE VCIP 2024)

Abstract: Optimizing framerate for a given bitrate-spatial resolution pair in adaptive video streaming is essential to maintain perceptual quality while considering decoding complexity. Low framerates at low bitrates reduce compression artifacts and decrease decoding energy. We propose a novel method, Decoding-complexity aware Framerate Prediction (DECODRA), which employs a Variable Framerate Pareto-front approach to predict an optimized framerate that minimizes decoding energy under quality degradation constraints. DECODRA dynamically adjusts the framerate based on current bitrate and spatial resolution, balancing trade-offs between framerate, perceptual quality, and decoding complexity. Extensive experimentation with the Inter-4K dataset demonstrates DECODRA’s effectiveness, yielding an average PSNR and VMAF increase of 0.87 dB and 5.14 points, respectively, for the same bitrate compared to the default 60 fps encoding. Additionally, DECODRA achieves an average reduction in decoding energy consumption of 13.27 %, enhancing the viewing experience, extending mobile device battery life, and reducing the energy footprint of streaming services.

Authors: Mohammad Ghasempour (AAU, Austria), Hadi Amirpour (AAU, Austria), and Christian Timmerer (AAU, Austria)

Abstract: Video streaming has become an integral part of our digital lives, driving the need for efficient video delivery. With the growing demand for seamless video delivery, adaptive video streaming has emerged as a solution to support users with varying device capabilities and network conditions. Traditional adaptive streaming relies on a predetermined set of bitrate-resolution pairs, known as bitrate ladders, for encoding. However, this “one-size-fits-all” approach is suboptimal when dealing with diverse video content. Consequently, per-title encoding approaches dynamically select the bitrate ladder for each content. However, in an era when carbon dioxide emissions have become a paramount concern, it is crucial to consider energy consumption. Therefore, this paper addresses the pressing issue of increasing energy consumption in video streaming by introducing a novel approach, ESTR, which goes beyond traditional quality-centric resolution selection approaches. Instead, the ESTR considers both video quality and decoding energy consumption to construct an optimal bitrate ladder tailored to the unique characteristics of each video content. To accomplish this, ESTR encodes each video content using a range of spatial and temporal resolutions, each paired with specific bitrates. It then establishes a maximum acceptable quality drop threshold (τ), carefully selecting resolutions that not only preserve video quality above this threshold but also minimize decoding energy consumption. Our experimental results, at a fixed τ of 2 VMAF steps, demonstrate a 32.87% to 41.86% reduction in decoding energy demand for HEVC-encoded videos across various software decoder implementations and operating systems, with a maximum bitrate increase of 2.52%. Furthermore, on a hardware-accelerated client device, a 46.37% energy saving was achieved during video playback at the expense of a 2.52% bitrate increase. Remarkably, these gains in energy efficiency are achieved while maintaining consistent video quality.

At the PCS 2024 (Picture Coding Symposium), held in Taichung, Taiwan from June 12-14, Hadi Amirpour received the Best Paper Award for the paper “Beyond Curves and Thresholds – Introducing Uncertainty Estimation To Satisfied User Ratios for Compressed Video” written together with Jingwen Zhu, Raimund Schatz, Patrick Le Callet and Christian Timmerer. Congratulations!

Title: DeepVCA: Deep Video Complexity Analyzer

Authors: Hadi Amirpour (AAU, Klagenfurt, Austria), Klaus Schoeffmann (AAU, Klagenfurt, Austria), Mohammad Ghanbari (University of Essex, UK), Christian Timmerer (AAU, Klagenfurt, Austria)

Abstract: Video streaming and its applications are growing rapidly, making video optimization a primary target for content providers looking to enhance their services. Enhancing the quality of videos requires the adjustment of different encoding parameters such as bitrate, resolution, and frame rate. To avoid brute force approaches for predicting optimal encoding parameters, video complexity features are typically extracted and utilized. To predict optimal encoding parameters effectively, content providers traditionally use unsupervised feature extraction methods, such as ITU-T’s Spatial Information ( SI ) and Temporal Information ( TI ) to represent the spatial and temporal complexity of video sequences. Recently, Video Complexity Analyzer (VCA) was introduced to extract DCT-based features to represent the complexity of a video sequence (or parts thereof). These unsupervised features, however, cannot accurately predict video encoding parameters. To address this issue, this paper introduces a novel supervised feature extraction method named DeepVCA, which extracts the spatial and temporal complexity of video sequences using deep neural networks. In this approach, the encoding bits required to encode each frame in intra-mode and inter-mode are used as labels for spatial and temporal complexity, respectively. Initially, we benchmark various deep neural network structures to predict spatial complexity. We then leverage the similarity of features used to predict the spatial complexity of the current frame and its previous frame to rapidly predict temporal complexity. This approach is particularly useful as the temporal complexity may depend not only on the differences between two consecutive frames but also on their spatial complexity. Our proposed approach demonstrates significant improvement over unsupervised methods, especially for temporal complexity. As an example application, we verify the effectiveness of these features in predicting the encoding bitrate and encoding time of video sequences, which are crucial tasks in video streaming. The source code and dataset are available at https://github.com/cd-athena/ DeepVCA.

 

ACM MMSys 2024, Bari, Italy, Apr. 15-18, 2024 

Authors: Emanuele Artioli (Alpen-Adria-Universität Klagenfurt, Austria), Farzad Tashtarian (Alpen-Adria-Universität Klagenfurt, Austria), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

Abstract: As the popularity of video streaming entertainment continues to grow, understanding how users engage with the content and react to its changes becomes a critical success factor for every stakeholder. User engagement, i.e., the percentage of video the user watches before quitting, is central to customer loyalty, content personalization, ad relevance, and A/B testing. This paper presents DIGITWISE, a digital twin-based approach for modeling adaptive video streaming engagement. Traditional adaptive bitrate (ABR) algorithms assume that all users react similarly to video streaming artifacts and network issues, neglecting individual user sensitivities. DIGITWISE leverages the concept of a digital twin, a digital replica of a physical entity, to model user engagement based on past viewing sessions. The digital twin receives input about streaming events and utilizes supervised machine learning to predict user engagement for a given session. The system model consists of a data processing pipeline, machine learning models acting as digital twins, and a unified model to predict engagement. DIGITWISE employs the XGBoost model in both digital twins and unified models. The proposed architecture demonstrates the importance of personal user sensitivities, reducing user engagement prediction error by up to 5.8% compared to non-user-aware models. Furthermore, DIGITWISE can optimize content provisioning and delivery by identifying the features that maximize engagement, providing an average engagement increase of up to 8.6 %.

Keywords: digital twin, user engagement, xgboost

 

 

ACM Mile High Video 2024 (mhv), Denver, Colorado, February 11-14, 2024

Authors: Daniele Lorenzi (Alpen-Adria-Universität Klagenfurt, Austria), Minh Nguyen (Alpen-Adria-Universität Klagenfurt, Austria), Farzad Tashtarian (Alpen-Adria-Universität Klagenfurt, Austria), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

Abstract: HTTP Adaptive Streaming (HAS) is the de-facto solution for delivering video content over the Internet. The climate crisis has highlighted the environmental impact of information and communication technologies (ICT) solutions and the need for green solutions to reduce ICT’s carbon footprint. As video streaming dominates Internet traffic, research in this direction is vital now more than ever. HAS relies on Adaptive BitRate (ABR) algorithms, which dynamically choose suitable video representations to accommodate device characteristics and network conditions. ABR algorithms typically prioritize video quality, ignoring the energy impact of their decisions. Consequently, they often select the video representation with the highest bitrate under good network conditions, thereby increasing energy consumption. This is problematic, especially for energy-limited devices, because it affects the device’s battery life and the user experience. To address the aforementioned issues, we propose E-WISH, a novel energy-aware ABR algorithm, which extends the already-existing WISH algorithm to consider energy consumption while selecting the quality for the next video segment. According to the experimental findings, E-WISH shows the ability to improve Quality of Experience (QoE) by up to 52% according to the ITU-T P.1203 model (mode 0) while simultaneously reducing energy consumption by up to 12% with respect to state-of-the-art approaches.

Keywords: HTTP adaptive streaming, Energy, Adaptive Bitrate (ABR), DASH

 

With the current popularity of ECO in the Asia–Pacific (APAC), the Bitmovin team in APAC, led by Adrian Britton, expressed an interest in the energy-aware research initiatives conducted within the GAIA project in Austria. Following an introductory meeting between the APAC team and AAU on October 17, 2023, both teams decided to meet in person on November 21, 2023, to explore the topics further.

The meeting proved to be highly productive, centering around two recent research topics:

– VE-Match: Video Encoding Matching-Based Model in the Cloud and Edge (presented by Samira Afzal & Narges Mehran)

– Energy-aware Spatial and Temporal Resolution Selection for Per-Title (presented by Mohammad Ghasempour & Hadi Amirpour)

Many interesting Q&As appeared during each presentation due to customer and provider requirements and the future insight of climate-friendly video streaming in the Cloud and Edge. The fruitful discussions opened up avenues for future exploration in this dynamic field.

IEEE Transactions on Network and Service Management

Authors: Reza Farahani, Ekrem Cetinkaya, Christian Timmerer, Mohammad Shojafar, Mohammad Ghanbari, and Hermann Hellwagner

Abstract: Recent years have witnessed video streaming demands evolve into one of the most popular Internet applications. With the ever-increasing personalized demands for high-definition and low-latency video streaming services, network-assisted video streaming schemes employing modern networking paradigms have become a promising complementary solution in the HTTP Adaptive Streaming (HAS) context. The emergence of such techniques addresses long-standing challenges of enhancing users’ Quality of Experience (QoE), end-to-end (E2E) latency, as well as network utilization. However, designing a cost-effective, scalable, and flexible network-assisted video streaming architecture that supports the aforementioned requirements for live streaming services is still an open challenge. This article leverage novel networking paradigms, i.e., edge computing and Network Function Virtualization (NFV), and promising video solutions, i.e., HAS, Video Super-Resolution (SR), and Distributed Video Transcoding (TR), to introduce A Latency- and cost-aware hybrId P2P-CDN framework for liVe video strEaming (ALIVE). We first introduce the ALIVE multi-layer architecture and design an action tree that considers all feasible resources (i.e., storage, computation, and bandwidth) provided by peers, edge, and CDN servers for serving peer requests with acceptable latency and quality. We then formulate the problem as a Mixed Integer Linear Programming (MILP) optimization model executed at the edge of the network. To alleviate the optimization model’s high time complexity, we propose a lightweight heuristic, namely, Greedy-Based Algorithm (GBA). Finally, we (i) design and instantiate a large-scale cloud-based testbed including 350 HAS players, (ii) deploy ALIVE on it, and (iii) conduct a series of experiments to evaluate the performance of ALIVE in various scenarios. Experimental results indicate that ALIVE (i) improves the users’ QoE by at least 22%, (ii) decreases incurred cost of the streaming service provider by at least 34%, (iii) shortens clients’ serving latency by at least 40%, (iv) enhances edge server energy consumption by at least 31%, and (v) reduces backhaul bandwidth usage by at least 24% compared to baseline approaches.

Keywords: HTTP Adaptive Streaming (HAS); Edge Com- puting; Network Function Virtualization (NFV); Content Deliv- ery Network (CDN); Peer-to-Peer (P2P); Quality of Experience (QoE); Video Transcoding; Video Super-Resolution.