Multimedia Communication

GenStream: Semantic Streaming Framework for Generative Reconstruction of Human-centric Media

ACM Multimedia 2025

October 27 – October 31, 2025

Dublin, Ireland

[PDF]

Emanuele Artioli (AAU, Austria), Daniele Lorenzi (AAU, Austria), Shivi Vats (AAU, Austria), Farzad Tashtarian (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: Video streaming dominates global internet traffic, yet conventional pipelines remain inefficient for structured, human-centric content such as sports, performance, or interactive media. Standard codecs re-encode entire frames, foreground and background alike, treating all pixels uniformly and ignoring the semantic structure of the scene. This leads to significant bandwidth waste, particularly in scenarios where backgrounds are static and motion is constrained to a few salient actors. We introduce GenStream, a semantic streaming framework that replaces dense video frames with compact, structured metadata. Instead of transmitting pixels, GenStream encodes each scene as a combination of skeletal keypoints, camera viewpoint parameters, and a static 3D background model. These elements are transmitted to the client, where a generative model reconstructs photorealistic human figures and composites them into the 3D scene from the original viewpoint. This paradigm enables extreme compression, achieving over 99.9% bandwidth reduction compared to HEVC. We partially validate GenStream on Olympic figure skating footage and demonstrate potential high perceptual fidelity under minimal data. Looking forward, GenStream opens new directions in volumetric avatar synthesis, canonical 3D actor fusion across views, personalized and immersive viewing experiences at arbitrary viewpoints, and lightweight scene reconstruction, laying the groundwork for scalable, intelligent streaming in the post-codec era.

Receiving Kernel-Level Insights via eBPF: Can ABR Algorithms Adapt Smarter?

Würzburg Workshop on Next-Generation Communication Networks (WueWoWAS) 2025

6 – 8 Oct 2025, Würzburg, Germany

[PDF]

Mohsen Ghasemi (Sharif University of Technology, Iran); Daniele Lorenzi (Alpen-Adria-Universität Klagenfurt, Austria); Mahdi Dolati (Sharif University of Technology, Iran); Farzad Tashtarian (Alpen-Adria Universität Klagenfurt, Austria); Sergey Gorinsky (IMDEA Networks Institute, Spain); Christian Timmerer (Alpen-Adria-Universität Klagenfurt & Bitmovin, Austria)

Abstract: The rapid rise of video streaming services such as Netflix and YouTube has made video delivery the largest driver of global Internet traffic, including mobile networks such as 5G or the upcoming 6G network. To maintain playback quality, client devices employ Adaptive Bitrate (ABR) algorithms that adjust video quality based on metrics like available bandwidth and buffer occupancy. However, these algorithms often react slowly to sudden bandwidth fluctuations due to limited visibility into network conditions, leading to stall events that significantly degrade the user’s Quality of Experience (QoE). In this work, we introduce CaBR, a Congestion-aware adaptive BitRate decision module designed to operate on top of existing ABR algorithms. CaBR enhances video streaming performance by leveraging real-time, in-kernel network telemetry collected via the extended Berkeley Packet Filter (eBPF). By utilizing congestion metrics such as queue lengths observed at network switches, CaBR refines the bitrate selection of the underlying ABR algorithms for upcoming segments, enabling faster adaptation to changing network conditions. Our evaluation shows that CaBR significantly reduces the playback stalls and improves QoE by up to 25% compared to state-of-the-art approaches in a congested environment.

On Thursday, July 30, 2025, Daniele Lorenzi successfully defended his PhD thesis (QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming) under the supervision of Prof. Hermann Hellwagner and Prof. Christian Timmerer. The defense was chaired by Assoc.-Prof. DI Dr. Klaus Schöffmann and the examiners were Assoc. – Prof. Luca De Cicco and Dr.-Ing. habil. Christian Herglotz.

We are pleased to congratulate Dr. Daniele Lorenzi on successfully passing his Ph.D. examination!

Hadi

Authors: Ahmed Telili (TII, UAE), Wassim Hamidouche (TII, UAE), Brahim Farhat (TII, UAE), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria), Ibrahim Khadraoui (TII, UAE), Jiajie Lu (Politecnico di Milano, Italy), The Van Le (IVCL, South Korea), Jeonneung Baek (IVCL, South Korea), Jin Young Lee (IVCL, South Korea), Yiying Wei (AAU, Austria), Xiaopeng Sun (Meituan Inc. China), Yu Gao (Meituan Inc. China), JianCheng Huang (Meituan Inc. China) and Yujie Zhong (Meituan Inc. China)

Journal: Signal Processing: Image Communication

Abstract: Omnidirectional (360-degree) video is rapidly gaining popularity due to advancements in immersive technologies like virtual reality (VR) and extended reality (XR). However, real-time streaming of such videos, particularly in live mobile scenarios such as unmanned aerial vehicles (UAVs), is hindered by limited bandwidth and strict latency constraints. While traditional methods such as compression and adaptive resolution are helpful, they often compromise video quality and introduce artifacts that diminish the viewer’s experience. Additionally, the unique spherical geometry of 360-degree video, with its wide field of view, presents challenges not encountered in traditional 2D video. To address these challenges, we initiated the 360-degree Video Super Resolution and Quality Enhancement challenge. This competition encourages participants to develop efficient machine learning (ML)-powered solutions to enhance the quality of low-bitrate compressed 360-degree videos, under two tracks focusing on 2× and 4× super-resolution (SR). In this paper, we outline the challenge framework, detailing the two competition tracks and highlighting the SR solutions proposed by the top-performing models. We assess these models within a unified framework, (i) considering quality enhancement, (ii) bitrate gain, and (iii) computational efficiency. Our findings show that lightweight single-frame models can effectively balance visual quality and runtime performance under constrained conditions, setting strong baselines for future research. These insights offer practical guidance for advancing real-time 360-degree video streaming, particularly in bandwidth-limited immersive applications.

 

Hadi

Co-located with ACM Multimedia 2025

URL: https://weizhou-geek.github.io/workshop/MM2025.html

In health and medicine, an immense amount of data is being generated by distributed sensors and cameras, as well as multimodal digital health platforms that support multimedia, such as audio, video, image, 3D geometry, and text. The availability of such multimedia data from medical devices and digital record systems has greatly increased the potential for automated diagnosis. The past several years have witnessed an explosion of interest, and a dizzyingly fast development, in computer-aided medical investigations using MRI, CT, X-rays, images, point clouds, etc. This proposed workshop focuses on various multimedia computing techniques (including mobile solutions and hardware solutions) for health and medicine, which targets real-world data/problems in healthcare, involves a large number of stakeholders, and is closely connected with people’s health.

Hadi

ACM MM’25 Tutorial: Perceptually Inspired Visual Quality Assessment in Multimedia Communication

ACM MM 2025, October 27, 2025, Dublin, Ireland

https://acmmm2025.org/tutorial/

Tutorial speakers:

  • Wei Zhou (Cardiff University)
  • Hadi Amirpour (University of Klagenfurt)

Tutorial description:

As multimedia services like video streaming, video conferencing, virtual reality (VR), and online gaming continue to expand, ensuring high perceptual quality becomes a priority for maintaining user satisfaction and competitiveness. However, during acquisition, compression, transmission, and storage, multimedia content undergoes various distortions, causing degradation in experienced quality. Thus, perceptual quality assessment, which focuses on evaluating the quality of multimedia content based on human perception, is essential for optimizing user experiences in advanced communication systems. Several challenges are involved in the quality assessment process, including diverse characteristics of multimedia content such as image, video, VR, point cloud, mesh, multimodality, etc., and complex distortion scenarios as well as viewing conditions. The tutorial first presents a detailed overview of principles and methods for perceptually inspired visual quality assessment. This includes both subjective methods, where users directly rate their experience, and objective methods, where algorithms predict human perception based on measurable factors such as bitrate, frame rate, and compression levels. Based on the basics of perceptually inspired visual quality assessment, metrics for different multimedia data are then introduced. Apart from the traditional image and video, immersive multimedia and AI-generated content will also be involved.

Hadi

URL: https://dl.acm.org/journal/tomm

Authors: Ahmed Telili (INSA, Rennes, France),  Wassim Hamidouce (INSA, Rennes, France), Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Sid Ahmed Fezza (INPTIC, Algeira), Christian Timmerer (Alpen-Adria-Universität Klagenfurt), and Luce Morin (INSA, Rennes, France)

Abstract:
HTTP adaptive streaming (HAS ) has emerged as a prevalent approach for over-the-top (OTT ) video streaming services due to its ability to deliver a seamless user experience. A fundamental component of HAS is the bitrate ladder, which comprises a set of encoding parameters (e.g., bitrate-resolution pairs) used to encode the source video into multiple representations. This adaptive bitrate ladder enables the client’s video player to dynamically adjust the quality of the video stream in real-time based on fluctuations in network conditions, ensuring uninterrupted playback by selecting the most suitable representation for the available bandwidth. The most straightforward approach involves using a fixed bitrate ladder for all videos, consisting of pre-determined bitrate-resolution pairs known as one-size-fits-all. Conversely, the most reliable technique relies on intensively encoding all resolutions over a wide range of bitrates to build the convex hull, thereby optimizing the bitrate ladder by selecting the representations from the convex hull for each specific video. Several techniques have been proposed to predict content-based ladders without performing a costly, exhaustive search encoding. This paper provides a comprehensive review of various convex hull prediction methods, including both conventional and learning-based approaches. Furthermore, we conduct a benchmark study of several handcrafted- and deep learning ( DL )-based approaches for predicting content-optimized convex hulls across multiple codec settings. The considered methods are evaluated on our proposed large-scale dataset, which includes 300 UHD video shots encoded with software and hardware encoders using three state-of-the-art video standards, including AVC /H.264, HEVC /H.265, and VVC /H.266, at various bitrate points. Our analysis provides valuable insights and establishes baseline performance for future research in this field.
Dataset URL: https://nasext-vaader.insa-rennes.fr/ietr-vaader/datasets/br_ladder

Hadi

Perceptually-aware Online Per-title Encoding for Live Video Streaming – US Patent

PDF

Vignesh Menon (Alpen-Adria-Universität Klagenfurt, Austria), Hadi Amirpour (Alpen-Adria-Universität Klagenfurt, Austria), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

Abstract: Techniques for implementing perceptually aware per-title encoding may include receiving an input video, a set of resolutions, a maximum target bitrate and a minimum target bitrate, extracting content aware features for each segment of the input video, predicting a perceptually aware bitrate-resolution pair for each segment using a model configured to optimize for a quality metric using constants trained for each of the set of resolutions, generating a target encoding set including a set of perceptually aware bitrate-resolution pairs, and encoding the target encoding set. The content aware features may include a spatial energy feature and an average temporal energy. According to these methods only a subset of bitrates and resolutions, less than a full set of bitrates and resolutions, are encoded to provide high quality video content for streaming.

Title: Project “Scalable Platform for Innovations on Real-time Immersive Telepresence” (SPIRIT) successfully passed periodic review

The “Scalable Platform for Innovations on Real-time Immersive Telepresence” (SPIRIT) project, a Horizon Europe innovation initiative uniting seven consortium partners, including ITEC from the University of Klagenfurt, has successfully completed its periodic review that took place in November 2024.

SPIRIT aims to develop a “multi-site, interconnected framework dedicated for supporting the operation of heterogeneous collaborative telepresence applications at large scale”.

ITEC focuses on three key areas in SPIRIT:

  • determining subjective and objective metrics for the Quality of Experience (QoE) of volumetric video,
  • developing a Live Low Latency DASH (Dynamic Adaptive Streaming over HTTP) system for the transmission of volumetric video, and
  • contributing to standardisation bodies regarding work done in volumetric video.

The review committee was satisfied with the project’s progress, and accepted all deliverables. The project was praised for a successful first round of open calls, which saw a remarkable 61 applicants for 11 available spots.

ITEC’s work with researching QoE of volumetric video through subjective testing was also deemed impressive, with us having obtained over 2000 data points across two rounds of testing. Contributions to standardisation bodies such as MPEG and 3GPP were also praised.

ITEC continues to work in the SPIRIT project, focusing on the second round of open calls and Live Low Latency DASH transmission of volumetric video.

DORBINE is a cooperative project between AIR6 Systems and Alpen-Adria-Universität Klagenfurt (AAU) (Farzad Tashtarian, project leader; Christian Timmerer and Hamid Amirpourazarian) and is funded by the Austrian Research Promotion Agency FFG.

Project description: Renewable energy plays a critical role in the global transition to sustainable and environmentally friendly power sources, and among the various technologies, turbines stand out as a key contributor. Wind turbines, for example, can convert up to 45% of the available wind energy into electricity, with modern designs reaching efficiencies as high as 50%, depending on conditions. The DORBINE project aims to enhance wind turbine efficiency in electricity production by developing an innovative inspection framework powered by cutting-edge AI techniques. It leverages a swarm of drones equipped with high-resolution cameras and advanced sensors to perform real-time, detailed blade inspections without the need for turbine shutdowns.