Special issue on Open Media Compression: Overview, Design Criteria, and Outlook on Emerging Standards

Proceedings of the IEEE, vol. 109, no. 9, Sept. 2021

By CHRISTIAN TIMMERER, Senior Member IEEE
Guest Editor
MATHIAS WIEN, Member IEEE
Guest Editor
LU YU, Senior Member IEEE
Guest Editor
AMY REIBMAN, Fellow IEEE Guest Editor

Abstract: Multimedia content (i.e., video, image, audio) is responsible for the majority of today’s Internet traffic and numbers are expecting to grow beyond 80% in the near future. For more than 30 years, international standards provide tools for interoperability and are both source and sink for challenging research activities in the domain of multimedia compression and system technologies. The goal of this special issue is to review those standards and focus on (i) the technology developed in the context of these standards and (ii) research questions addressing aspects of these standards which are left open for competition by both academia and industry.

Index Terms—Open Media Standards, MPEG, JPEG, JVET, AOM, Computational Complexity

C. Timmerer, M. Wien, L. Yu and A. Reibman, “Special issue on Open Media Compression: Overview, Design Criteria, and Outlook on Emerging Standards,” in Proceedings of the IEEE, vol. 109, no. 9, pp. 1423-1434, Sept. 2021, doi: 10.1109/JPROC.2021.3098048.

Read more

Title: Where to Encode: A Performance Analysis of Intel x86 and Arm-based Amazon EC2 Instances

20-23 September 2021 // Innsbruck, Austria // Online Conference

Link: IEEE eScience 2021

Authors: Roland Matha´∗, Dragi Kimovski*, Anatoliy Zabrovskiy*‡, Christian Timmerer*†, Radu Prodan*
Institute of Information Technology (ITEC), University of Klagenfurt, Austria*
Bitmovin, Klagenfurt, Austria†
Petrozavodsk State University, Petrozavodsk, Russia‡

Abstract: Video streaming became an undivided part of the Internet. To efficiently utilise the limited network bandwidth it is essential to encode the video content. However, encoding is a computationally intensive task, involving high-performance resources provided by private infrastructures or public clouds. Public clouds, such as Amazon EC2, provide a large portfolio of services and instances optimized for specific purposes and budgets. The majority of Amazon’s instances use x86 processors, such as Intel Xeon or AMD EPYC. However, following the recent trends in computer architecture, Amazon introduced Arm based instances that promise up to 40% better cost performance
ratio than comparable x86 instances for specific workloads. We evaluate in this paper the video encoding performance of x86 and Arm instances of four instance families using the latest FFmpeg version and two video codecs. We examine the impact of the encoding parameters, such as different presets and bitrates, on the time and cost for encoding. Our experiments reveal that Arm instances show high time and cost saving potential of up to
33.63% for specific bitrates and presets, especially for the x264 codec. However, the x86 instances are more general and achieve low encoding times, regardless of the codec.

Index Terms—Amazon EC2, Arm instances, AVC, Cloud computing, FFmpeg, Graviton2, HEVC, Performance analysis, Video encoding.

Link: IEEE Global Communications Conference 2021

7-11 December 2021 // Madrid, Spain // Hybrid: In-Person and Virtual Conference Connecting Cultures around the Globe

Authors: F. Tashtarian*, R. Falanji‡, A. Bentaleb+, A. Erfanian*, P. S. Mashhadi§,
C. Timmerer*, H. Hellwagner*, R. Zimmermann+
Christian Doppler Laboratory ATHENA, Institute of Information Technology, Alpen-Adria-Universität Klagenfurt, Austria*
Department of Mathematical Science, Sharif University of Technology, Tehran, Iran‡
Department of Computer Science, School of Computing, National University of Singapore (NUS)+
Center for Applied Intelligent Systems Research (CAISR), Halmstad University, Sweden§

Abstract: Recent years have seen tremendous growth in HTTP adaptive live video traffic over the Internet. In the presence of highly dynamic network conditions and diverse request patterns, existing yet simple hand-crafted heuristic approaches for serving client requests at the network edge might incur a large overhead and significant increase in time complexity. Therefore, these approaches might fail in delivering acceptable Quality of Experience (QoE) to end users. To bridge this gap, we propose ROPL, a learning-based client request management solution at the edge that leverages the power of the recent breakthroughs in deep reinforcement learning, to serve requests of concurrent users joining various HTTP-based live video channels. ROPL is able to react quickly to any changes in the environment, performing accurate decisions to serve clients requests, which results in achieving satisfactory user QoE. We validate the efficiency of ROPL through trace-driven simulations and a real-world setup. Experimental results from real-world scenarios confirm that ROPL outperforms existing heuristic-based approaches in terms of QoE, with a factor up to 3.7×.

Index Terms—Network Edge; Request Serving; HTTP Live Streaming; Low Latency; QoE; Deep Reinforcement Learning.

Title: End-to-end Quality of Experience Evaluation for HTTP Adaptive Streaming

ACM MM’21: The 29th ACM International Conference on Multimedia

October  20-24, 2021,  Chengdu, China

Babak Taraghi (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt)

Abstract: Exponential growth in multimedia streaming traffic over the Internet motivates the research and further investigation of the user’s perceived quality of such services. Enhancement of experienced quality by the users becomes more substantial when service providers compete on establishing superiority by gaining more subscribers or customers. Quality of Experience (QoE) enhancement would not be possible without an authentic and accurate assessment of the streaming sessions. HTTP Adaptive Streaming (HAS) is today’s prevailing technique to deliver the highest possible audio and video content quality to the users. An end-to-end evaluation of QoE in HAS covers the precise measurement of the metrics that affect the perceived quality, eg. startup delay, stall events, and delivered media quality. Mentioned metrics improvements could limit the service’s scalability, which is an important factor in real-world scenarios. In this study, we will investigate the stated metrics, best practices and evaluations methods, and available techniques with an aim to (i) design and develop practical and scalable measurement tools and prototypes, (ii) provide a better understanding of current technologies and techniques (eg. Adaptive Bitrate algorithms), (iii) conduct in-depth research on the significant metrics in a way that improvements of QoE with scalability in mind would be feasible, and finally, (iv) provide a comprehensive QoE model which outperforms state-of-the-art models.

Keywords: HTTP Adaptive Streaming; Quality of Experience; Subjective Evaluation; Objective Evaluation; Adaptive Bitrate; QoE model.

Title: CTU Depth Decision Algorithms for HEVC: A Survey

Link: Signal Processing: Image Communication

[PDF]

Ekrem Çetinkaya* (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hadi Amirpour*, (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Mohammad Ghanbari (Christian Doppler Laboratory ATHENA, University of Essex),  and Christian Timmerer (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt)

*These authors contributed equally to this work.

Abstract: High Efficiency Video Coding (HEVC) surpasses its predecessors in encoding efficiency by introducing new coding tools at the cost of an increased encoding time-complexity. The Coding Tree Unit (CTU) is the main building block used in HEVC. In the HEVC standard, frames are divided into CTUs with the predetermined size of up to 64 × 64 pixels. Each CTU is then divided recursively into a number of equally sized square areas, known as Coding Units (CUs). Although this diversity of frame partitioning increases encoding efficiency, it also causes an increase in the time complexity due to the increased number of ways to find the optimal partitioning. To address this complexity, numerous algorithms have been proposed to eliminate unnecessary searches during partitioning CTUs by exploiting the correlation in the video. In this paper, existing CTU depth decision algorithms for HEVC are surveyed. These algorithms are categorized into two groups, namely statistics and machine learning approaches. Statistics approaches are further subdivided into neighboring and inherent approaches. Neighboring approaches exploit the similarity between adjacent CTUs to limit the depth range of the current CTU, while inherent approaches use only the available information within the current CTU. Machine learning approaches try to extract and exploit similarities implicitly. Traditional methods like support vector machines or random forests use manually selected features, while recently proposed deep learning methods extract features during training. Finally, this paper discusses extending these methods to more recent video coding formats such as Versatile Video Coding (VVC) and AOMedia Video 1 (AV1).

Keywords: HEVC, Coding Tree Unit, Complexity, CTU Partitioning, Statistics, Machine Learning

Agata and Michał Barciś and their fellow researcher from RTB House in Poland, Michał Jagielski, competed in the Drone Bot Contest at the Deep Drone Challenge in Ingolstadt, Germany on Saturday 7 August 2021.

The competition is organised by start-up incubator brigkAIR and Europe’s largest aircraft manufacturer Airbus. The three young scientists were delighted to receive a prize of 25,000 Euros.

Read more about it here.

ACM Multimedia Systems Conference (MMSys) 2021 | Doctoral Symposium

September 28 – October 01, 2021 | Istanbul, Turkey

Conference Website

Read more

Authors: M. Barciś, A. Barciś, N. Tsiogkas, H. Hellwagner.

Title: Information Distribution in Multi-Robot Systems: Generic, Utility-Aware Optimization Middleware.

Frontiers in Robotics and AI 8:685105, July 2021.

This work addresses the problem of what information is worth sending in a multi-robot system under generic constraints, e.g., limited throughput or energy. Our decision method is based on Monte Carlo Tree Search. It is designed as a transparent middleware that can be integrated into existing systems to optimize communication among robots. Furthermore, we introduce techniques to reduce the decision space of this problem to further improve the performance. We evaluate our approach using a simulation study and demonstrate its feasibility in a real-world environment by realizing a proof of concept in ROS 2 on mobile robots.

Published paper

Authors: Alireza Erfanian* (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hadi Amirpour*, (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Farzad Tashtarian (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt),  Christian Timmerer (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hermann Hellwagner (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt)

*These authors contributed equally to this work.

Link: IEEE Access

Abstract: Due to the growing demand for video streaming services, providers have to deal with increasing resourcerequirements for increasingly heterogeneous environments. To mitigate this problem, many works have beenproposed which aim to (i) improve cloud/edge caching efficiency, (ii) use computation power available in thecloud/edge for on-the-fly transcoding, and (iii) optimize the trade-off among various cost parameters,e.g.,storage, computation, and bandwidth. In this paper, we proposeLwTE, a novelLight-weightTranscodingapproach at theEdge, in the context of HTTP Adaptive Streaming (HAS). During the encoding processof a video segment at the origin side, computationally intense search processes are going on. The mainidea ofLwTEis to store the optimal results of these search processes as metadata for each video bitrateand reuse them at the edge servers to reduce the required time and computational resources for on-the-fly transcoding.LwTEenables us to store only the highest bitrate plus corresponding metadata (of verysmall size) for unpopular video segments/bitrates. In this way, in addition to the significant reduction inbandwidth and storage consumption, the required time for on-the-fly transcoding of a requested segment isremarkably decreased by utilizing its corresponding metadata; unnecessary search processes are avoided.Popular video segments/bitrates are being stored. We investigate our approach for Video-on-Demand (VoD)streaming services by optimizing storage and computation (transcoding) costs at the edge servers and thencompare it to conventional methods (store all bitrates, partial transcoding). The results indicate that ourapproach reduces the transcoding time by at least 80% and decreases the aforementioned costs by 12% to70% compared to the state-of-the-art approaches.

Keywords: Video streaming, transcoding, video on demand, edge computing.

Title: WISH: User-centric Bitrate Adaptation for HTTP Adaptive Streaming on Mobile Devices

IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP)

October 06-08, Tampere, Finland

Authors: Minh Nguyen (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Ekrem Çetinkaya (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hermann Hellwagner (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), and Christian Timmerer (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt)

Abstract: Recently, mobile devices have become paramount in online video streaming. Adaptive bitrate (ABR) algorithms of players responsible for selecting the quality of the videos face critical challenges in providing a high Quality of Experience (QoE) for end users. One open issue is how to ensure the optimal experience for heterogeneous devices in the context of extreme variation of mobile broadband networks. Additionally, end users may have different priorities on video quality and data usage (i.e., the amount of data downloaded to the devices through the mobile networks). A generic mechanism for players that enables specification of various policies to meet end users’ needs is still missing. In this paper, we propose a weighted sum model, namely WISH, that yields high QoE of the video and allows end users to express their preferences among different parameters (i.e., data usage, stall events, and video quality) of video streaming. WISH has been implemented into ExoPlayer, a popular player used in many mobile applications. The experimental results show that WISH improves the QoE by up to 17.6% while saving 36.4% of data usage compared to state-of-the-art ABR algorithms and provides dynamic adaptation to end users’ requirements.

Keywords: ABR Algorithms, HTTP Adaptive Streaming, ITU-T P.1203, WISH