Multimedia Communication


IEEE Transactions on Image Processing (TIP)
Journal Website


Authors: Hadi Amirpour (Alpen-Adria-Universität Klagenfurt, Austria), Christine Guillemot (INRIA, France), Mohammad Ghanbari (University of Essex, UK), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

Abstract: Light field imaging, which captures both spatial and angular information, improves user immersion by enabling post-capture actions, such as refocusing and changing view perspective. However, light fields represent very large volumes of data with a lot of redundancy that coding methods try to remove. State-of-the-art coding methods indeed usually focus on improving compression efficiency and overlook other important features in light field compression such as scalability. In this paper, we propose a novel light field image compression method that enables (i) viewport scalability, (ii) quality scalability, (iii) spatial scalability, (iv) random access, and (v) uniform quality distribution among viewports, while keeping compression efficiency high. To this end, light fields in each spatial resolution are divided into sequential viewport layers, and viewports in each layer are encoded using the previously encoded viewports. In each viewport layer, \revision{the} available viewports are used to synthesize intermediate viewports using a video interpolation deep learning network. The synthesized views are used as virtual reference images to enhance the quality of intermediate views. An image super-resolution method is applied to improve the quality of the lower spatial resolution layer. The super-resolved images are also used as virtual reference images to improve the quality of the higher spatial resolution layer.
The proposed structure also improves the flexibility of light field streaming, provides random access to the viewports, and increases error resiliency. The experimental results demonstrate that the proposed method achieves a high compression efficiency and it can adapt to the display type, transmission channel, network condition, processing power, and user needs.

Keywords—Light field, compression, scalability, random access, deep learning.

The threat of climate change requires a drastic reduction of global greenhouse gas (GHG) emissions in several societal spheres. Thus, this also applies to reducing and rethinking the energy consumption of digital technologies. Video streaming technology is responsible for more than half of digital technology’s global impact [ref]. There is rapid growth, also now with digital and remote work has become more mainstream, in the amount of video data volume, processing of video content, and streaming which affects the rise of energy consumption and its associated GHG emissions.

The International Workshop on Green Multimedia Systems 2023 (GMSys 2023) aims to bring together experts and researchers to present and discuss recent developments and challenges for energy reduction in multimedia systems. This workshop focuses on innovations, concepts, and energy-efficient solutions from video generation to processing, delivery, and further usage.

Find further info at


29th International Conference on MultiMedia Modeling
9 – 12 January 2023 | Bergen, Norway

Daniele Lorenzi (Alpen-Adria-Universität Klagenfurt), Farzad Tashtarian (Alpen-Adria-Universität Klagenfurt), Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Alpen-Adria-Universität Klagenfurt), and Hermann Hellwagner (Alpen-Adria-Universität Klagenfurt)


HTTP Adaptive Streaming (HAS) is the predominant technique to deliver video contents across the Internet with the increasing demand of its applications. With the evolution of videos to deliver more immersive experiences, such as their evolution in resolution and framerate, highly efficient video compression schemes are required to ease the burden on the delivery process. While AVC/H.264 still represents the most adopted codec, we are experiencing an increase in the usage of new generation codecs (HEVC/H.265, VP9, AV1, VVC/H.266, etc.). Compared to AVC/H.264, these codecs can either achieve the same quality besides a bitrate reduction or improve the quality while targeting the same bitrate. In this paper, we propose a Mixed-Binary Linear Programming (MBLP) model called Multi-Codec Optimization Model at the edge for Live streaming (MCOM-Live) to jointly optimize (i) the overall streaming costs, and (ii) the visual quality of the content played
out by the end-users by efficiently enabling multi-codec content delivery. Given a video content encoded with multiple codecs according to a fixed bitrate ladder, the model will choose among three available policies, i.e., fetch, transcode, or skip, the best option to handle the representations. We compare the proposed model with traditional approaches used in the industry. The experimental results show that our proposed method can reduce the additional latency by up to 23% and the streaming costs by up to 78%, besides improving the visual quality of the delivered segments by up to 0.5 dB, in terms of PSNR.

MCOM architecture overview.

OTEC: An Optimized Transcoding Task Scheduler for Cloud and Fog Environments

ACM CoNEXT 2022ViSNext

Samira Afzal (Alpen-Adria-Universität Klagenfurt), Farzad Tashtarian (Alpen-Adria-Universität Klagenfurt), Hamid Hadian (Alpen-Adria-Universität Klagenfurt), Alireza Erfanian (Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Alpen-Adria-Universität Klagenfurt), and Radu Prodan (Alpen-Adria-Universität Klagenfurt)


Encoding and transcoding videos into multiple codecs and representations is a significant challenge that requires seconds or even days on high-performance computers depending on many technical characteristics, such as video complexity or encoding parameters. Cloud computing offering on-demand computing resources optimized to meet the needs of customers and their budgets is a promising technology for accelerating dynamic transcoding workloads. In this work, we propose OTEC, a novel multi-objective optimization method based on the mixed-integer linear programming model to optimize the computing instance selection for transcoding processes. OTEC determines the type and number of cloud and fog resource instances for video encoding and transcoding tasks with optimized computation cost and time. We evaluated OTEC on AWS EC2 and Exoscale instances for various administrator priorities, the number of encoded video segments, and segment transcoding times. The results show that OTEC can achieve appropriate resource selections and satisfy the administrator’s priorities in terms of time and cost minimization.

OTEC architecture overview.

Vignesh V Menon

Transactions on Multimedia Computing Communications and Applications (TOMM)

Journal Website

Vignesh V Menon (Alpen-Adria-Universität Klagenfurt),  Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK)and Christian Timmerer (Alpen-Adria-Universität Klagenfurt)


In HTTP Adaptive Streaming (HAS), videos are encoded at multiple bitrates and spatial resolutions (i.e., representations) to adapt to the heterogeneity of network conditions, device attributes, and end-user preferences. Encoding the same video segment at
multiple representations increases costs for content providers. State-of-the-art multi-encoding schemes improve the encoding process by utilizing encoder analysis information from already encoded representation(s) to reduce the encoding time of the remaining
representations. These schemes typically use the highest bitrate representation as the reference to accelerate the encoding of the remaining representations. Nowadays, most streaming services utilize cloud-based encoding techniques, enabling a fully parallel
encoding process to reduce the overall encoding time. The highest bitrate representation has the highest encoding time than the other representations. Thus, utilizing it as the reference encoding is unfavorable in a parallel encoding setup as the overall encoding time is bound by its encoding time. This paper provides a comprehensive study of various multi-rate and multi-encoding schemes in both serial and parallel encoding scenarios. Furthermore, it introduces novel heuristics to limit the Rate Distortion Optimization (RDO) process across various representations. Based on these heuristics, three multi-encoding schemes are proposed, which rely on encoder analysis sharing across different representations: (i) optimized for the highest compression efficiency, (ii) optimized for the best compression efficiency-encoding time savings trade-off, and (iii) optimized for the best encoding time savings. Experimental results demonstrate that the proposed multi-encoding schemes (i), (ii), and (iii) reduce the overall serial encoding time by 34.71%, 45.27%, and 68.76% with a 2.3%, 3.1%, and 4.5% bitrate increase to maintain the same VMAF, respectively compared to stand-alone encodings. The overall parallel encoding time is reduced by 22.03%, 20.72%, and 76.82% compared to stand-alone encodings for schemes (i), (ii), and (iii), respectively.

An example of video representations’ storage in HAS. The input video is encoded at multiple resolutions and bitrates. Novel multi-rate and multi-resolution encoder
analysis sharing methods are presented to accelerate encoding in more than one representation.

2022 IEEE International Conference on Image Processing (ICIP)

October 16-19, 2022 | Bordeaux, France

Conference Website


Abstract: According to the Bitmovin Video Developer Report 2021, live streaming at scale has the highest scope for innovation in video streaming services. Currently, there are no open-source implementations available which can predict video complexity for live streaming applications. To this light, we plan to demo the functions of VCA software, and show accuracy of the complexities analyzed by VCA ( using the heatmaps, and show-case the speed of video complexity analysis. VCA can achieve an analysis speed of about 370fps compared to the 5fps speed of the reference SITI implementation. Hence, we show that it can be used for live streaming applications.

In the demo, we also showcase an application of VCA in detail: optimized CRF prediction for adaptive streaming, which is being presented in ICIP’22 (Paper ID: 2030). This scheme improves the compression efficiency of the conventional ABR encoding for live streaming.


  • Vignesh V Menon, University of Klagenfurt, Austria (
  • Christian Feldmann, Bitmovin, Austria (
  • Hadi Amirpour, University of Klagenfurt, Austria (
  • Christian Timmerer, Bitmovin, Austria (

IEEE Access, A Multidisciplinary, Open-access Journal of the IEEE


Minh Nguyen (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Daniele Lorenzi (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Farzad Tashtarian (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hermann Hellwagner (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt)

(*) Minh Nguyen and Daniele Lorenzi contributed equally to this work


Abstract: HTTP Adaptive Streaming (HAS) solutions use various adaptive bitrate (ABR) algorithms to select suitable video qualities with the objective of coping with the variations of network connections. HTTP has been evolving with various versions and provides more and more features. Most of the existing ABR algorithms do not significantly benefit from the HTTP development when they are merely supported by the most recent HTTP version. An open research question is “How can new features of the recent HTTP versions be used to enhance the performance of HAS?” To address this question, in this paper, we introduce Days of Future Past+ (DoFP+ for short), a heuristic algorithm that takes advantage of the features of the latest HTTP version, HTTP/3, to provide high Quality of Experience (QoE) to the viewers. DoFP+ leverages HTTP/3 features, including (i) stream multiplexing, (ii) stream priority, and (iii) request cancellation to upgrade low-quality segments in the player buffer while downloading the next segment. The qualities of those segments are selected based on an objective function and throughput constraints. The objective function takes into account two factors, namely the (i) average bitrate and the (ii) video instability of the considered set of segments. We also examine different strategies of download order for those segments to optimize the QoE in limited resources scenarios. The experimental results show an improvement in QoE by up to 33% while the number of stalls and stall duration for DoFP+ are reduced by 86% and 92%, respectively, compared to state-of-the-art ABR schemes. In addition, DoFP+ saves on average up to 16% downloaded data across all test videos. Also, we find that downloading segments sequentially brings more benefits for retransmissions than concurrent downloads; and lower-quality segments should be upgraded before other segments to gain more QoE improvement. Our source code has been published for reproducibility at

Keywords: HTTP/3, ABR algorithm, QoE, HAS, DASH

At Christian Doppler laboratory ATHENA, we offer an internship*) for 2023 for Master Students and we kindly request your applications until the 20th of January 2023 with the following data (in German or English):

  • CV
  • Record of study/transcript (“Studienerfolgsnachweis”)

*) A 3 months period in 2023 (with an exact time slot to be discussed) with the possibility to spend up to 1-month at the industrial partner; 20h per week “Universitäts-KV, Verwendungsgruppe C1, studentische Hilfskraft”

Please send your application by email to

About ATHENA: The Christian Doppler laboratory ATHENA (AdapTive Streaming over HTTP and Emerging Networked MultimediA Services) is jointly proposed by the Institute of Information Technology (ITEC; at Alpen-Adria-Universität Klagenfurt (AAU) and Bitmovin GmbH ( to address current and future research and deployment challenges of HAS and emerging streaming methods. AAU (ITEC) has been working on adaptive video streaming for more than a decade, has a proven record of successful research projects and publications in the field, and has been actively contributing to MPEG standardization for many years, including MPEG-DASH; Bitmovin is a video streaming software company founded by ITEC researchers in 2013 and has developed highly successful, global R&D and sales activities and a world-wide customer base since then.

The aim of ATHENA is to research and develop novel paradigms, approaches, (prototype) tools, and evaluation results for the phases

  1. multimedia content provisioning,
  2. content delivery, and
  3. content consumption in the media delivery chain as well as for
  4. end-to-end aspects, with a focus on, but not being limited to, HTTP Adaptive Streaming (HAS).

The new approaches and insights are to enable Bitmovin to build innovative applications and services to account for the steadily increasing and changing multimedia traffic on the Internet.

Vignesh V Menon

2022 Picture Coding Symposium (PCS)

December 7-9, 2022 | San Jose, CA, USA

Conference Website

Vignesh V Menon (Alpen-Adria-Universität Klagenfurt),  Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Prajit T Rajendran (Universite Paris-Saclay, France), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt)


In live streaming applications, a fixed set of bitrate-resolution pairs (known as bitrate ladder) is generally used to avoid additional pre-processing run-time to analyze the complexity of every video content and determine the optimized bitrate ladder. Furthermore, live encoders use the fastest available preset for encoding to ensure the minimum possible latency in streaming. For live encoders, it is expected that the encoding speed is equal to the video framerate. However, an optimized encoding preset may result in (i) increased Quality of Experience (QoE) and (ii) improved CPU utilization while encoding. In this light, this paper introduces a Content-Adaptive encoder Preset prediction Scheme (CAPS) for adaptive live video streaming applications. In this scheme, the encoder preset is determined using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for every video segment, the number of
CPU threads allocated for each encoding instance, and the target encoding speed. Experimental results show that CAPS yields an overall quality improvement of 0.83 dB PSNR and 3.81 VMAF with the same bitrate, compared to the fastest preset encoding
of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder. This is achieved by maintaining the desired encoding speed and reducing CPU idle time.


2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP)

September 26-28, 2022 | Shanghai, China

Conference Website

Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Prajit T Rajendran (Universite Paris-Saclay, Paris, France), Vignesh V Menon (Alpen-Adria-Universität Klagenfurt),   Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK)and Christian Timmerer (Alpen-Adria-Universität Klagenfurt)


The increasing demand for high-quality and low-cost video streaming services calls for the prediction of video encoding complexity. The prior prediction of video encoding complexity including encoding time and bitrate predictions are used to allocate resources and set optimized parameters for video encoding effectively. In this paper, a light-weight video encoding complexity prediction (VECP) scheme that predicts the encoding bitrate and the encoding time of video with high accuracy is proposed. Firstly, low-complexity Discrete Cosine Transform (DCT)-energy-based features, namely spatial complexity, temporal complexity, and brightness of videos are extracted, which can efficiently
represent the encoding complexity of videos. The latent vectors are also extracted from a Convolutional Neural Network (CNN) with MobileNet as the backend to obtain additional features from representative frames of each video to assist the prediction process. The extreme gradient boosting (XGBoost) regression algorithm is deployed to predict video encoding complexity using the extracted features. The experimental results demonstrate that VECP predicts the encoding bitrate with an error percentage of up to 3.47% and encoding time with an error percentage of up to 2.89%, but with a significantly low overall latency of 3.5 milliseconds per frame which makes it suitable for both Video on Demand (VoD) and live streaming applications.

VECP architecture