Multimedia Communication

HTTP Adaptive Streaming – Quo Vadis?

Christian Timmerer, Tuesday, June 29, 2021

35th Picture Coding Symposium (PCS) 2021

Abstract: Video traffic on the Internet is constantly growing; networked multimedia applications consume a predominant share of the available Internet bandwidth. A major technical breakthrough and enabler in multimedia systems research and of industrial networked multimedia services certainly was the HTTP Adaptive Streaming (HAS) technique. This resulted in the standardization of MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) which, together with HTTP Live Streaming (HLS), is widely used for multimedia delivery in today’s networks. Existing challenges in multimedia systems research deal with the trade-off between (i) the ever-increasing content complexity, (ii) various requirements with respect to time (most importantly, latency), and (iii) quality of experience (QoE). Optimizing towards one aspect usually negatively impacts at least one of the other two aspects if not both.

This situation sets the stage for our research work in the ATHENA Christian Doppler (CD) Laboratory (Adaptive Streaming over HTTP and Emerging Networked Multimedia Services; https://athena.itec.aau.at/), jointly funded by public sources and industry.

In this talk, we will present selected novel approaches and research results of the first year of the ATHENA CD Lab’s operation. We will highlight HAS-related research on (i) multimedia content provisioning (machine learning for video encoding); (ii) multimedia content delivery (support of edge processing and virtualized network functions for video networking); (iii) multimedia content consumption and end-to-end aspects (player-triggered segment retransmissions to improve video playout quality); and (iv) novel QoE investigations (adaptive point cloud streaming). We will also put the work into the context of the international multimedia systems research.

Vignesh V Menon

At IEEE International Conference on Image Processing (ICIP) on September 19-22, 2021, Alaska, USA.

Authors: Vignesh V Menon (Alpen-Adria-Universität Klagenfurt),  Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt).

Abstract: Video delivery over the Internet has been becoming a commodity in recent years, owing to the widespread use of DASH. The DASH specification defines a hierarchical data model for Media Presentation Descriptions (MPDs) in terms of segments. This paper focuses on segmenting video into multiple shots for encoding in  VoD HAS applications.
This paper proposes a novel DCT feature-based shot detection and successive elimination algorithm for shot detection algorithm and benchmark the algorithm against the default shot detection algorithm of the x265 implementation of the HEVC standard. Our experimental results demonstrate that the proposed feature-based pre-processor has a recall rate of 25% and an F-measure of 20% greater than the benchmark algorithm for shot detection.

Keywords: HTTP Adaptive Streaming, Video-on-Demand, Shot detection, multi-shot encoding.

Link: https://2021.ieeeicip.org

IEEE Open Journal of Signal Processing

Authors: Ekrem Çetinkaya (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hadi Amirpour, (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt, Bitmovin), and Mohammad Ghanbari (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt, University of Essex)

Abstract: Video streaming applications keep getting more attention over the years, and HTTP Adaptive Streaming (HAS) became the de-facto solution for video delivery over the Internet. In HAS, each video is encoded at multiple quality levels and resolutions (i.e., representations) to enable adaptation of the streaming session to viewing and network conditions of the client. This requirement brings encoding challenges along with it, e.g., a video source should be encoded efficiently at multiple bitrates and resolutions. Fast multi-rate encoding approaches aim to address this challenge of encoding multiple representations from a single video by re-using information from already encoded representations. In this paper, a convolutional neural network is used to speed up both multi-rate and multi-resolution encoding for HAS. For multi-rate encoding, the lowest bitrate representation is chosen as the reference. For multi-resolution encoding, the highest bitrate from the lowest resolution representation is chosen as the reference. Pixel values from the target resolution and encoding information from the reference representation are used to predict Coding Tree Unit (CTU) split decisions in High-Efficiency Video Coding (HEVC) for dependent representations. Experimental results show that the proposed method for multi-rate encoding can reduce the overall encoding time by 15.08% and parallel encoding time by 41.26%, with a 0.89% bitrate increase compared to the HEVC reference software. Simultaneously, the proposed method for multi-resolution encoding can reduce the encoding time by 46.27% for the overall encoding and 27.71% for the parallel encoding on average with a 2.05% bitrate
increase.

Keywords: HTTP Adaptive Streaming, HEVC, Multirate Encoding, Machine Learning

Vignesh V Menon

Conference info: Picture Coding Symposium (PCS), 29 June-2 July 2021, Bristol, UK

Conference Website: https://pcs2021.org

Authors: Vignesh V Menon (Alpen-Adria-Universität Klagenfurt),  Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Bitmovin), and Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK)

Abstract: Since video accounts for the majority of today’s internet traffic, the popularity of HTTP Adaptive Streaming (HAS) is increasing steadily. In HAS, each video is encoded at multiple bitrates and spatial resolutions (i.e., representations) to adapt to a heterogeneity of network conditions, device characteristics, and end-user preferences. Most of the streaming services utilize cloud-based encoding techniques which enable a fully parallel encoding process to speed up the encoding and consequently to reduce the overall time complexity. State-of-the-art approaches further improve the encoding process by utilizing encoder analysis information from already encoded representation(s) to improve the encoding time complexity of the remaining representations. In this paper, we investigate various multi-encoding algorithms (i.e., multi-rate and multi-resolution) and propose novel multi- encoding algorithms for large-scale HTTP Adaptive Streaming deployments. Experimental results demonstrate that the proposed multi-encoding algorithm optimized for the highest compression efficiency reduces the overall encoding time by 39% with a 1.5% bitrate increase compared to stand-alone encodings. Its optimized version for the highest time savings reduces the overall encoding time by 50% with a 2.6% bitrate increase compared to stand-alone encodings.

Keywords: HTTP Adaptive Streaming, HEVC, Multi-rate Encoding, Multi-encoding.

Conference info: NOSSDAV’21: The 31st edition of the Workshop on Network and Operating System Support for Digital Audio and Video Sept. 28-Oct. 1, 2021, Istanbul, Turkey

Conference Website: https://nossdav.org/2021/

Authors: Reza Farahani (Alpen-Adria-Universität Klagenfurt), Farzad Tashtarian (Alpen-Adria-Universität Klagenfurt), Alireza Erfanian (Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Bitmovin), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK) and Hermann Hellwagner (Alpen-Adria-Universität Klagenfurt)

Abstract: Recently, HTTP Adaptive Streaming (HAS) has become the dominant video delivery technology over the Internet. In HAS, clients have full control over the media streaming and adaptation processes. Lack of coordination among the clients and lack of awareness of the network conditions may lead to sub-optimal user experience, and resource utilization in a pure client-based HAS adaptation scheme. Software-Defined Networking (SDN) has recently been considered to enhance the video streaming process. In this paper, we leverage the capability of SDN and Network Function Virtualization (NFV) to introduce an edge- and SDN-assisted video streaming framework called ES-HAS. We employ virtualized edge components to collect HAS clients’ requests and retrieve networking information in a time-slotted manner. These components then perform an optimization model in a time-slotted manner to efficiently serve clients’ requests by selecting an optimal cache server (with the shortest fetch time). In case of a cache miss, a client’s request is served (i) by an optimal replacement quality (only better quality levels with minimum deviation) from a cache server, or (ii) by the original requested quality level from the origin server. This approach is validated through experiments on a large-scale testbed, and the performance of our framework is compared to pure client-based strategies and the SABR system [11]. Although SABR and ES-HAS show (almost) identical performance in the number of quality switches, ES-HAS outperforms SABR in terms of playback bitrate and the number of stalls by at least 70% and 40%, respectively.

Keywords: Dynamic Adaptive Streaming over HTTP (DASH), Edge Computing, Network-Assisted Video Streaming, Quality of Experience (QoE), Software Defined Networking (SDN), Network Function Virtualization (NFV)

The paper “PSTR: Per-title encoding using Spatio-Temporal Resolutions” has been accepted for publication at the IEEE International Conference on Multimedia and Expo (ICME) 2021 at July 5-9, 2021 Shenzhen, China.

Authors: Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Bitmovin), and Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK)

Abstract: Current per-title encoding schemes encode the same video content (or snippets/subsets thereof) at various bitrates and spatial resolutions to find an optimal bitrate ladder for each video content. Compared to traditional approaches, in which a predefined, content-agnostic (“fit-to-all”) encoding ladder is applied to all video contents, per-title encoding can result in (i) a significant decrease of storage and delivery costs and (ii) an increase in the Quality of Experience. In the current per-title encoding schemes, the bitrate ladder is optimized using only spatial resolutions, while we argue that with the emergence of high framerate videos, this principle can be extended to temporal resolutions as well. In this paper, we improve the per-title encoding for each content using spatio-temporal resolutions. Experimental results show that our proposed approach doubles the performance of bitrate saving by considering both temporal and spatial resolutions compared to considering only spatial resolutions.

Keywords: Bitrate ladder, per-title encoding, framerate, spatial resolution.

IEEE International Conference on Multimedia and Expo (ICME) , 5-9 July 2021, Shenzhen, China

Authors: Bernhard Rinner, Christian Bettstetter, Hermann Hellwagner, and Stephan Weiss

Abstract: Drones have evolved from bulky research platforms to everyday objects that enable a variety of innovative applications. One of the current challenges is to unite individual drones into an integrated autonomous system. They should operate as a networked team to provide novel functionality that multiple individual drones can never achieve. This article addresses the building blocks of such multidrone systems: wireless connectivity, communication, and coordination. We discuss implementation aspects in three experimental case studies, compare our techniques for improving resource efficiency, and present some “lessons learned” from our research experience in this area.

NOSSDAV’21: The 31st edition of the Workshop on Network and Operating System Support for Digital Audio and Video
Sept. 28-Oct. 1, 2021, Istanbul, Turkey
Conference Website

Authors: Babak Taraghi (Alpen-Adria-Universität Klagenfurt), Abdelhak Bentaleb (National University of Singapore), Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Bitmovin), Roger Zimmermann (National University of Singapore) and Hermann Hellwagner (Alpen-Adria-Universität Klagenfurt)

Abstract: Adaptive BitRate (ABR) algorithms play a crucial role in delivering the highest possible viewer’s Quality of Experience (QoE) in HTTP Adaptive Streaming (HAS). Online video streaming service providers use HAS – the dominant video streaming technique on the Internet – to deliver the best QoE for their users. Viewer’s delightfulness relies heavily on how the ABR of a media player can adapt the stream’s quality to the current network conditions. QoE for end-to-end video streaming sessions has been evaluated in many research projects to give better insight into the quality metrics. Objective evaluation models such as ITU Telecommunication Standardization Sector (ITU-T) P.1203 allow for the calculation of Mean Opinion Score (MOS) by considering various QoE metrics, and subjective evaluation is the best assessment approach in investigating the end-user opinion over a video streaming session’s experienced quality. We have conducted subjective evaluations with crowdsourced participants and evaluated the MOS of the sessions using the ITU-T P.1203 quality model. This paper’s main contribution is subjective evaluation analogy with objective evaluation for well-known heuristic-based ABRs.

Keywords: HTTP Adaptive Streaming, ABR Algorithms, Quality of Experience, Crowdsourcing, Subjective Evaluation, Objective Evaluation, MOS, (ITU-T) P.1203

Hadi

Authors: Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Bitmovin), and Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK)

Abstract: Light field imaging enables some post-processing capabilities like refocusing, changing view perspective, and depth estimation. As light field images are represented by multiple views they contain a huge amount of data that makes compression inevitable. Although there are some proposals to efficiently compress light field images, their main focus is on encoding efficiency. However, some important functionalities such as viewpoint and quality scalabilities, random access, and uniform quality distribution have not been addressed adequately. In this paper, an efficient light field image compression method based on a deep neural network is proposed, which classifies multiple views into various layers. In each layer, the target view is synthesized from the available views of previously encoded/decoded layers using a deep neural network. This synthesized view is then used as a virtual reference for the target view inter-coding. In this way, random access to an arbitrary view is provided. Moreover, uniform quality distribution among multiple views is addressed. In higher bitrates where random access to an arbitrary view is more crucial, the required bitrate to access the requested view is minimized.

Keywords: Light field, Compression, Scalable, Random Access.

Data Compression Conference (DCC)

23-26 March 2021, Snowbird, Utah, USA

https://www.cs.brandeis.edu/~dcc

Authors: Prateek Agrawal (University of Klagenfurt, Austria), Anatoliy Zabrovskiy (University of Klagenfurt, Austria), Adithyan Ilagovan (Bitmovin Inc., CA, USA), Christian Timmerer (University of Klagenfurt, Austria), Radu Prodan (University of Klagenfurt, Austria)

Abstract: HTTP adaptive streaming of video content becomes an integrated part of the Internet and dominates other streaming protocols and solutions. The duration of creating video content for adaptive streaming ranges from seconds or up to several hours or days, due to the plethora of video transcoding parameters and video source types. Although, the computing resources of different transcoding platforms and services constantly increase, accurate and fast transcoding time prediction and scheduling is still crucial. We propose in this paper a novel method called Fast video Transcoding Time Prediction and Scheduling (FastTTPS) of x264 encoded videos based on three phases: (i) transcoding data engineering, (ii) transcoding time prediction, and (iii) transcoding scheduling. Read more