Hadi

EUVIP 2022 Special Session on

“Machine Learning for Immersive Content Processing”

September, 2022, Lisbon, Portugal

Link

Organizers:

  • Hadi Amirpour, Klagenfurt University, Austria
  • Christine Guillemot, INSA, France
  • Christian Timmerer, Klagenfurt University, Austria

 

Brief description:

The importance of remote communication is becoming more and more important in particular after  COVID-19 crisis. However, to bring a more realistic visual experience, more than the traditional two-dimensional (2D) interfaces we know today is required. Immersive media such as 360-degree, light fields, point cloud, ultra-high-definition, high dynamic range, etc. can fill this gap. These modalities, however, face several challenges from capture to display. Learning-based solutions show great promise and significant performance in improving traditional solutions in addressing the challenges. In this special session, we will focus on research works aimed at extending and improving the use of learning-based architectures for immersive imaging technologies.

Important dates:

Paper Submissions: 6th June, 2022
Paper Notifications: 11th July, 2022

 

Vignesh V Menon

2022 IEEE International Conference on Multimedia and Expo (ICME) Industry & Application Track

July 18-22, 2022 | Taipei, Taiwan

Conference Website

Vignesh V Menon (Alpen-Adria-Universität Klagenfurt),  Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Christian Feldmann (Bitmovin, Austria), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK)and Christian Timmerer (Alpen-Adria-Universität Klagenfurt)

Abstract:

In live streaming applications, typically a fixed set of bitrate-resolution pairs (known as a bitrate ladder) is used during the entire streaming session in order to avoid the additional latency to find scene transitions and optimized bitrate-resolution pairs for every video content. However, an optimized bitrate ladder per scene may result in (i) decreased
storage or delivery costs or/and (ii) increased Quality of Experience (QoE). This paper introduces an Online Per-Scene Encoding (OPSE) scheme for adaptive HTTP live streaming applications. In this scheme, scene transitions and optimized bitrate-resolution pairs for every scene are predicted using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features. Experimental results show that, on average, OPSE yields bitrate savings of upto 48.88% in certain scenes to maintain the same VMAF,
compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming.

The bitrate ladder prediction envisioned using OPSE.

Vignesh V Menon

2022 IEEE International Conference on Multimedia and Expo (ICME)

July 18-22, 2022 | Taipei, Taiwan

Conference Website

Vignesh V Menon (Alpen-Adria-Universität Klagenfurt),  Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK)and Christian Timmerer (Alpen-Adria-Universität Klagenfurt)

Abstract:

In live streaming applications, typically a fixed set of bitrate-resolution pairs (known as bitrate ladder) is used for simplicity and efficiency in order to avoid the additional encoding run-time required to find optimum resolution-bitrate pairs for every video content. However, an optimized bitrate ladder may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience (QoE). This paper introduces a perceptually-aware per-title encoding (PPTE) scheme for video streaming applications. In this scheme, optimized bitrate-resolution pairs are predicted online based on Just Noticeable Difference (JND) in quality perception to avoid adding perceptually similar representations in the bitrate ladder. To this end, Discrete Cosine Transform(DCT)-energy-based low-complexity spatial and temporal features for each video segment are used. Experimental results show that, on average, PPTE yields bitrate savings of 16.47% and 27.02% to maintain the same PSNR and VMAF, respectively, compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming accompanied by a 30.69% cumulative decrease in storage space for various representations.

 

Architecture of PPTE


The kick-off meeting of the “5G-KärntnerFog” Project took place on April, 21st, 2022 at Klagenfurt University. The purpose of this first meeting was primarily the definition of work structures, work packages, and getting to know each partner region. The project partners consist of the following institutions: ITEC (Lead), FH Kärnten, and Siplan. 

Title: A Traffic-sign recognition IoT-based Application
Authors: Narges Mehran, Dragi Kimovski, Zahra Najafabadi Samani, Radu Prodan
The work “A Traffic-sign recognition IoT-based Application” got granted for the presentation in the HiPEAC IoT challenge during CSW Spring 2022.
International data corporation predicts that 21.5  billion connected Internet of Things (IoT) devices will generate 55% of all data by 2025. Nowadays, camera sensors can be embedded in most devices. Therefore, we designed an application to receive a video stream from a camera sensor and perform the video processing. First our designed application pre-processes the sensed data by two high-quality video encoding and framing frameworks. Afterward, we apply the machine learning  (ML) model based on the low and high training accuracies. Because the user devices cannot often perform high-load machine learning training operations, we consider the ML inference operation acting as a lightweight trained ML model. At the end, the processed data is packaged for the consumer such as the driver of a car.

ICME Workshop on Hyper-Realistic Multimedia for Enhanced Quality of Experience (ICMEW)

July 18-22, 2022 | Taipei, Taiwan

Conference Website

Ekrem Çetinkaya (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hadi Amirpour (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), and Christian Timmerer (Christian Doppler LaboratoryATHENA, Alpen-Adria-Universität Klagenfurt)

Abstract: Light field imaging enables post-capture actions such as refocusing and changing view perspective by capturing both spatial and angular information. However, capturing richer information about the 3D scene results in a huge amount of data. To improve the compression efficiency of the existing light field compression methods, we investigate the impact of light field super-resolution approaches (both spatial and angular super-resolution) on the compression efficiency. To this end, firstly, we downscale light field images over (i) spatial resolution, (ii) angular resolution, and (iii) spatial-angular resolution and encode them using Versatile Video Coding (VVC). We then apply a set of light field super-resolution deep neural networks to reconstruct light field images in their full spatial-angular resolution and compare their compression efficiency. Experimental results show that encoding the low angular resolution light field image and applying angular super-resolution yield bitrate savings of 51.16 % and 53.41 % to maintain the same PSNR and SSIM, respectively, compared to encoding the light field image in high-resolution.

 

 

 

 

 

 

Keywords: Light field, Compression, Super-resolution, VVC.

MPEG, specifically, ISO/IEC JTC 1/SC 29/WG 3 (MPEG Systems), has been just awarded a Technology & Engineering Emmy® Award for its ground-breaking MPEG-DASH standard. Dynamic Adaptive Streaming over HTTP (DASH) is the first international de-jure standard that enables efficient streaming of video over the Internet and it has changed the entire video streaming industry including — but not limited to —  on-demand, live, and low latency streaming and even for 5G and the next generation of hybrid broadcast-broadband. The first edition has been published in April 2012 and MPEG is currently working towards publishing the 5th edition demonstrating an active and lively ecosystem still being further developed and improved to address requirements and challenges for modern media transport applications and services.

This award belongs to 90+ researchers and engineers from around 60 companies all around the world who participated in the development of the MPEG-DASH standard for over 12 years.

From left to right: Kyung-mo Park, Cyril Concolato, Thomas Stockhammer, Yuriy Reznik, Alex Giladi, Mike Dolan, Iraj Sodagar, Ali Begen, Christian Timmerer, Gary Sullivan, Per Fröjdh, Young-Kwon Lim, Ye-Kui Wang. (Photo © Yuriy Reznik)

Christian Timmerer, director of the Christian Doppler Laboratory ATHENA, chaired the evaluation of responses to the call for proposals and since that served as MPEG-DASH Ad-hoc Group (AHG) / Break-out Group (BoG) co-chair as well as co-editor for Part 2 of the standard. For a more detailed history of the MPEG-DASH standard, the interested reader is referred to Christian Timmerer’s blog post “HTTP Streaming of MPEG Media” (capturing the development of the first edition) and Nicolas Weill’s blog post “MPEG-DASH: The ABR Esperanto” (DASH timeline).

We are happy that our tutorial on Open Challenges of Interactive Video Search and Evaluation (by Jakub Lokoc, Klaus Schöffmann, Werner Bailer, Luca Rossetto, and Björn Thor Jonsson) has been accepted for ACM Multimedia (ACMMM 2022), to be held in Lisbon, Portugal, in October 2022.

The 5th Annual Lifelog Search Challenge (LSC 2022), co-organized by Klaus Schöffmann, will be this year’s grand challenge at the ACM International Conference on Multimedia Retrieval (ICMR 2022) in Newark, NJ, USA. More information here: https://www.icmr2022.org/program/challenges/