Multimedia Communication

The Quality of Experience (QoE) is well-defined in QUALINET white papers [here, here], but its assessment and metrics are subject to research. The aim of this workshop on “Quality of Immersive Media: Assessment and Metrics” is to provide a forum for researchers and practitioners to discuss the latest findings in this field. The scope of this workshop is (i) to raise awareness about MPEG efforts in the context of quality of immersive visual media and (ii) invite experts (outside of MPEG) to present new techniques relevant to this workshop.

Quality assessments in the context of the MPEG standardization process typically serve two purposes: (1) to foster decision-making on the tool adoptions during the standardization process and (2) to validate the outcome of a standardization effort compared to an established anchor (i.e., for verification testing).

We kindly invite you to the first online MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics as follows.

Logistics (online):

Program/Speakers:

15:00-15:10: Joel Jung & Christian Timmerer (AhG co-chairs): Welcome notice

15:10-15:30: Mathias Wien (AG 5 convenor): MPEG Visual Quality Assessment: Tasks and Perspectives
Abstract: The Advisory Group on MPEG Visual Quality Assessment (ISO/IEC JTC1 SC29/AG5) has been founded in 2020 with the goal to select and design subjective quality evaluation methodologies and objective quality metrics for the assessment of visual coding technologies in the context of the MPEG standardization work. In this talk, the current work items, as well as perspectives and first achievements of the group, are presented.

15:30-15:50: Aljosa Smolic: Perception and Quality of Immersive Media
Abstract: Interest in immersive media increased significantly over recent years. Besides applications in entertainment, culture, health, industry, etc., telepresence and remote collaboration gained importance due to the pandemic and climate crisis. Immersive media have the potential to increase social integration and to reduce greenhouse gas emissions. As a result, technologies along the whole pipeline from capture to display are maturing and applications are becoming available, creating business opportunities. One aspect of immersive technologies that is still relatively undeveloped is the understanding of perception and quality, including subjective and objective assessment. The interactive nature of immersive media poses new challenges to estimation of saliency or visual attention, and to the development of quality metrics. The V-SENSE lab of Trinity College Dublin addresses these questions in current research. This talk will highlight corresponding examples in 360 VR video, light fields, volumetric video and XR.

15:50-16:00: Break/Discussions

16:00-16:20: Jesús Gutiérrez: Quality assessment of immersive media: Recent activities within VQEG
Abstract: This presentation will provide an overview of the recent activities carried out on quality assessment of immersive media within the Video Quality Experts Group (VQEG), particularly within the Immersive Media Group (IMG). Among other efforts, outcomes will be presented from the cross-lab test (carried out by ten different labs) in order to assess and validate subjective evaluation methodologies for 360º videos, which was instrumental in the development of the ITU-T Recommendation P.919. Also, insights will be provided on the current plans on exploring the evaluation of the quality of experience of immersive communication systems, considering different technologies such as 360º video, point cloud, free-viewpoint video, etc.

16:20-16:40: Alexander Raake: <to-be-provided>

16:40-17:00: <to-be-provided>

17:00: Conclusions

Title: Where to Encode: A Performance Analysis of Intel x86 and Arm-based Amazon EC2 Instances

20-23 September 2021 // Innsbruck, Austria // Online Conference

Link: IEEE eScience 2021

Authors: Roland Matha´∗, Dragi Kimovski*, Anatoliy Zabrovskiy*‡, Christian Timmerer*†, Radu Prodan*
Institute of Information Technology (ITEC), University of Klagenfurt, Austria*
Bitmovin, Klagenfurt, Austria†
Petrozavodsk State University, Petrozavodsk, Russia‡

Abstract: Video streaming became an undivided part of the Internet. To efficiently utilise the limited network bandwidth it is essential to encode the video content. However, encoding is a computationally intensive task, involving high-performance resources provided by private infrastructures or public clouds. Public clouds, such as Amazon EC2, provide a large portfolio of services and instances optimized for specific purposes and budgets. The majority of Amazon’s instances use x86 processors, such as Intel Xeon or AMD EPYC. However, following the recent trends in computer architecture, Amazon introduced Arm based instances that promise up to 40% better cost performance
ratio than comparable x86 instances for specific workloads. We evaluate in this paper the video encoding performance of x86 and Arm instances of four instance families using the latest FFmpeg version and two video codecs. We examine the impact of the encoding parameters, such as different presets and bitrates, on the time and cost for encoding. Our experiments reveal that Arm instances show high time and cost saving potential of up to
33.63% for specific bitrates and presets, especially for the x264 codec. However, the x86 instances are more general and achieve low encoding times, regardless of the codec.

Index Terms—Amazon EC2, Arm instances, AVC, Cloud computing, FFmpeg, Graviton2, HEVC, Performance analysis, Video encoding.

Link: IEEE Global Communications Conference 2021

7-11 December 2021 // Madrid, Spain // Hybrid: In-Person and Virtual Conference Connecting Cultures around the Globe

Authors: F. Tashtarian*, R. Falanji‡, A. Bentaleb+, A. Erfanian*, P. S. Mashhadi§,
C. Timmerer*, H. Hellwagner*, R. Zimmermann+
Christian Doppler Laboratory ATHENA, Institute of Information Technology, Alpen-Adria-Universität Klagenfurt, Austria*
Department of Mathematical Science, Sharif University of Technology, Tehran, Iran‡
Department of Computer Science, School of Computing, National University of Singapore (NUS)+
Center for Applied Intelligent Systems Research (CAISR), Halmstad University, Sweden§

Abstract: Recent years have seen tremendous growth in HTTP adaptive live video traffic over the Internet. In the presence of highly dynamic network conditions and diverse request patterns, existing yet simple hand-crafted heuristic approaches for serving client requests at the network edge might incur a large overhead and significant increase in time complexity. Therefore, these approaches might fail in delivering acceptable Quality of Experience (QoE) to end users. To bridge this gap, we propose ROPL, a learning-based client request management solution at the edge that leverages the power of the recent breakthroughs in deep reinforcement learning, to serve requests of concurrent users joining various HTTP-based live video channels. ROPL is able to react quickly to any changes in the environment, performing accurate decisions to serve clients requests, which results in achieving satisfactory user QoE. We validate the efficiency of ROPL through trace-driven simulations and a real-world setup. Experimental results from real-world scenarios confirm that ROPL outperforms existing heuristic-based approaches in terms of QoE, with a factor up to 3.7×.

Index Terms—Network Edge; Request Serving; HTTP Live Streaming; Low Latency; QoE; Deep Reinforcement Learning.

Title: End-to-end Quality of Experience Evaluation for HTTP Adaptive Streaming

ACM MM’21: The 29th ACM International Conference on Multimedia

October  20-24, 2021,  Chengdu, China

Babak Taraghi (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt)

Abstract: Exponential growth in multimedia streaming traffic over the Internet motivates the research and further investigation of the user’s perceived quality of such services. Enhancement of experienced quality by the users becomes more substantial when service providers compete on establishing superiority by gaining more subscribers or customers. Quality of Experience (QoE) enhancement would not be possible without an authentic and accurate assessment of the streaming sessions. HTTP Adaptive Streaming (HAS) is today’s prevailing technique to deliver the highest possible audio and video content quality to the users. An end-to-end evaluation of QoE in HAS covers the precise measurement of the metrics that affect the perceived quality, eg. startup delay, stall events, and delivered media quality. Mentioned metrics improvements could limit the service’s scalability, which is an important factor in real-world scenarios. In this study, we will investigate the stated metrics, best practices and evaluations methods, and available techniques with an aim to (i) design and develop practical and scalable measurement tools and prototypes, (ii) provide a better understanding of current technologies and techniques (eg. Adaptive Bitrate algorithms), (iii) conduct in-depth research on the significant metrics in a way that improvements of QoE with scalability in mind would be feasible, and finally, (iv) provide a comprehensive QoE model which outperforms state-of-the-art models.

Keywords: HTTP Adaptive Streaming; Quality of Experience; Subjective Evaluation; Objective Evaluation; Adaptive Bitrate; QoE model.

Title: CTU Depth Decision Algorithms for HEVC: A Survey

Link: Signal Processing: Image Communication

[PDF]

Ekrem Çetinkaya* (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hadi Amirpour*, (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Mohammad Ghanbari (Christian Doppler Laboratory ATHENA, University of Essex),  and Christian Timmerer (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt)

*These authors contributed equally to this work.

Abstract: High Efficiency Video Coding (HEVC) surpasses its predecessors in encoding efficiency by introducing new coding tools at the cost of an increased encoding time-complexity. The Coding Tree Unit (CTU) is the main building block used in HEVC. In the HEVC standard, frames are divided into CTUs with the predetermined size of up to 64 × 64 pixels. Each CTU is then divided recursively into a number of equally sized square areas, known as Coding Units (CUs). Although this diversity of frame partitioning increases encoding efficiency, it also causes an increase in the time complexity due to the increased number of ways to find the optimal partitioning. To address this complexity, numerous algorithms have been proposed to eliminate unnecessary searches during partitioning CTUs by exploiting the correlation in the video. In this paper, existing CTU depth decision algorithms for HEVC are surveyed. These algorithms are categorized into two groups, namely statistics and machine learning approaches. Statistics approaches are further subdivided into neighboring and inherent approaches. Neighboring approaches exploit the similarity between adjacent CTUs to limit the depth range of the current CTU, while inherent approaches use only the available information within the current CTU. Machine learning approaches try to extract and exploit similarities implicitly. Traditional methods like support vector machines or random forests use manually selected features, while recently proposed deep learning methods extract features during training. Finally, this paper discusses extending these methods to more recent video coding formats such as Versatile Video Coding (VVC) and AOMedia Video 1 (AV1).

Keywords: HEVC, Coding Tree Unit, Complexity, CTU Partitioning, Statistics, Machine Learning

Vignesh V Menon

Vignesh V Menon is invited to talk on “Video Coding for HTTP Adaptive Streaming” on the Research@Lunch is a research webinar series by Humanitarian Technology (HuT) Labs, Amrita Vishwa Vidyapeetham University, India, exclusively for Ph.D. Scholars, UG, and PG Researchers in India.  This talk will introduce the basics of video codecs and highlight the scope of HAS-related research on video encoding.

Time: August 14, 10.00AM-10.30AM (CEST) or 1.30PM- 2.00PM (IST)

Registration form can be found here.

 

ACM Multimedia Systems Conference (MMSys) 2021 | Doctoral Symposium

September 28 – October 01, 2021 | Istanbul, Turkey

Conference Website

Read more

Authors: M. Barciś, A. Barciś, N. Tsiogkas, H. Hellwagner.

Title: Information Distribution in Multi-Robot Systems: Generic, Utility-Aware Optimization Middleware.

Frontiers in Robotics and AI 8:685105, July 2021.

This work addresses the problem of what information is worth sending in a multi-robot system under generic constraints, e.g., limited throughput or energy. Our decision method is based on Monte Carlo Tree Search. It is designed as a transparent middleware that can be integrated into existing systems to optimize communication among robots. Furthermore, we introduce techniques to reduce the decision space of this problem to further improve the performance. We evaluate our approach using a simulation study and demonstrate its feasibility in a real-world environment by realizing a proof of concept in ROS 2 on mobile robots.

Published paper

Authors: Alireza Erfanian* (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hadi Amirpour*, (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Farzad Tashtarian (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt),  Christian Timmerer (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hermann Hellwagner (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt)

*These authors contributed equally to this work.

Link: IEEE Access

Abstract: Due to the growing demand for video streaming services, providers have to deal with increasing resourcerequirements for increasingly heterogeneous environments. To mitigate this problem, many works have beenproposed which aim to (i) improve cloud/edge caching efficiency, (ii) use computation power available in thecloud/edge for on-the-fly transcoding, and (iii) optimize the trade-off among various cost parameters,e.g.,storage, computation, and bandwidth. In this paper, we proposeLwTE, a novelLight-weightTranscodingapproach at theEdge, in the context of HTTP Adaptive Streaming (HAS). During the encoding processof a video segment at the origin side, computationally intense search processes are going on. The mainidea ofLwTEis to store the optimal results of these search processes as metadata for each video bitrateand reuse them at the edge servers to reduce the required time and computational resources for on-the-fly transcoding.LwTEenables us to store only the highest bitrate plus corresponding metadata (of verysmall size) for unpopular video segments/bitrates. In this way, in addition to the significant reduction inbandwidth and storage consumption, the required time for on-the-fly transcoding of a requested segment isremarkably decreased by utilizing its corresponding metadata; unnecessary search processes are avoided.Popular video segments/bitrates are being stored. We investigate our approach for Video-on-Demand (VoD)streaming services by optimizing storage and computation (transcoding) costs at the edge servers and thencompare it to conventional methods (store all bitrates, partial transcoding). The results indicate that ourapproach reduces the transcoding time by at least 80% and decreases the aforementioned costs by 12% to70% compared to the state-of-the-art approaches.

Keywords: Video streaming, transcoding, video on demand, edge computing.

Title: WISH: User-centric Bitrate Adaptation for HTTP Adaptive Streaming on Mobile Devices

IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP)

October 06-08, Tampere, Finland

Authors: Minh Nguyen (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Ekrem Çetinkaya (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hermann Hellwagner (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), and Christian Timmerer (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt)

Abstract: Recently, mobile devices have become paramount in online video streaming. Adaptive bitrate (ABR) algorithms of players responsible for selecting the quality of the videos face critical challenges in providing a high Quality of Experience (QoE) for end users. One open issue is how to ensure the optimal experience for heterogeneous devices in the context of extreme variation of mobile broadband networks. Additionally, end users may have different priorities on video quality and data usage (i.e., the amount of data downloaded to the devices through the mobile networks). A generic mechanism for players that enables specification of various policies to meet end users’ needs is still missing. In this paper, we propose a weighted sum model, namely WISH, that yields high QoE of the video and allows end users to express their preferences among different parameters (i.e., data usage, stall events, and video quality) of video streaming. WISH has been implemented into ExoPlayer, a popular player used in many mobile applications. The experimental results show that WISH improves the QoE by up to 17.6% while saving 36.4% of data usage compared to state-of-the-art ABR algorithms and provides dynamic adaptation to end users’ requirements.

Keywords: ABR Algorithms, HTTP Adaptive Streaming, ITU-T P.1203, WISH