Hadi

Title: Complexity prediction of hardware and software video transcoding in the cloud

Authors: Taieb Chachou, Sid Ahmed Fezza, Wassim Hamidouche, Ghalem Belalem, Hadi Amirpour

 

[PDF]

 

Abstract: Today, video content constitutes a significant portion of internet traffic. This video can be viewed by a wide range of devices with varying characteristics and under different network conditions. Video transcoding is a crucial mechanism for adapting video content to this diverse array of devices and bandwidth requirements while ensuring the best possible user experience. However, video transcoding is a computationally intensive process, requiring scalable infrastructure like cloud computing to efficiently handle the complexity and volume of tasks. In this paper, we propose a novel method to predict transcoding time across different types of platforms (CPU and GPU) and codecs (H.264/AVC, H.265/HEVC).

Unlike existing approaches that focus mainly on CPU-based transcoding, the proposed model explicitly considers hardware-accelerated (GPU) transcoding, where accelerators significantly influence video transcoding performance in cloud computing.

The predicted transcoding time can be utilized to optimize the scheduling of transcoding tasks in cloud computing, helping to ensure optimal load balancing and minimize total transcoding time while maintaining the highest video quality. The proposed solution consists of two essential phases: (i) dataset construction and (ii) model construction.  The first phase involves video selection, segmentation, and video transcoding. The second phase focuses on analyzing the most important features that influence the prediction of transcoding time and developing a machine learning-based model for accurate video transcoding time prediction. Experimental results demonstrate that the XGBoost model achieves superior prediction accuracy across both software and hardware codecs, achieving a global coefficient of determination of R²~=~0.993 when evaluated on the complete dataset, which includes video segments transcoded using H.264/AVC and H.265/HEVC codecs on CPU and GPU platforms. This performance represents an improvement of approximately 7.45% compared to state-of-the-art methods.

Im Rahmen der von der Österreichische Akademie der Wissenschaften getragenen Initiative „FÄKT“, die Wissenschaftsvideos speziell für 10- bis 14-Jährige aufbereitet, gibt ein neuer Beitrag spannende Einblicke in die Welt des Video-Streamings: Christian Timmerer, Leiter des CD-Labors für Adaptives Streaming über HTTP und entstehende netzwerkbasierte Multimediadienste an der Universität Klagenfurt und zweifacher Technology & Engineering Emmy Award-Preisträger, erklärt anschaulich, wie Videoinhalte weltweit übertragen und kontinuierlich optimiert werden.

Der Kurzfilm zeigt verständlich, welche Technologien hinter modernen Streaming-Diensten stecken und wie Forschung dazu beiträgt, die Qualität und Effizienz von Videoübertragungen laufend zu verbessern.

Direktlink zum Video: Dein Video hängt gerade? Ein Forscher hat das vor Jahren gelöst. Zweimal ausgezeichnet.

 

Banner: Video-Screenshot-Ausschnitt (c) FÄKT

 

 

The Lange Nacht der Forschung 2026 (long night of research) turned out to be a truly special evening — one that once again demonstrated how powerful it can be to bring science and research closer to the public. Thanks to the remarkable engagement, creativity, and enthusiasm of everyone involved, complex ideas were transformed into hands-on experiences for a broad and diverse audience.

With more than 9,000 visitors across the Lakeside Science & Technology Park and the University of Klagenfurt campus, the event was a great success. Each individual station contributed to making research tangible, interactive, and inspiring.

Strong Presence of Our Department

Our department was proudly represented with six stations/booths, four of which were hosted by our lab. Together, they showcased cutting-edge research in multimedia, artificial intelligence, and interactive systems, thus demonstrating both scientific depth and real-world impact.

Highlights from Our Lab

At our lab’s four stations, visitors had the opportunity to explore current research in an engaging and interactive way:

Detecting Damage in Wind Turbines with AI (L25)
How can we inspect wind turbines without shutting them down? This station introduced the DORBINE project, where AI-powered drone swarms are used for automated inspection. A two-meter model vividly demonstrated how such intelligent systems could reduce costs and downtime while improving energy efficiency.

Making 3D Video More Realistic (L26)
Visitors were introduced to 3D Gaussian Splatting (3DGS), a next-generation 3D video technology that enables highly realistic rendering of scenes with reduced data requirements. Through hands-on interaction, they experienced how real-world environments can be captured and reproduced as immersive 3D spaces.

Enhancing Video Quality with Super-Resolution (L27)
This station focused on AI-based super-resolution techniques. Attendees could directly compare videos of different quality levels and observe in real time how machine learning reconstructs fine details and textures from low-resolution footage.

Experiencing Multimedia with 3D Interaction (L28)
Using Apple Vision Pro head-mounted displays, visitors explored stereoscopic spatial videos and tested their skills in a 3D dart game. This station highlighted how perception and interaction merge in next-generation multimedia experiences, offering a glimpse into future human-computer interaction.

Making Research Tangible

What made the evening particularly special was not only the technologies themselves but also the way they were communicated: interactive demos, hands-on exploration, and direct conversations with researchers allowed visitors of all ages to engage with science in a meaningful way.

Thank You

A big thank you to everyone who contributed to making this event such a success, through preparation, creativity, and dedication on-site. Events like the Lange Nacht der Forschung thrive on teamwork, and this year was a perfect example.

On 20 April, Alison Grant, the Canadian ambassador to Austria, visited the University of Klagenfurt. As a part of an exclusive delegation, Dr. Felix Schniz accompanied the ambassador on a tour through campus grounds and the Lakeside Science & Technology Park, showcasing the appeal of the interdisciplinary Master’s Programme Game Studies and Engineering, the role of AAU as a hub of the technical sciences, the shared focal points of video game focused research in Canada and Austria, and local tech-focused organisations and support networks such as the FTF.

On 16 April, Dr Felix Schniz (ITEC) organised a guest talk and workshop by Flavia Mazzanti and Manuel Bornell from Immerea (www.immerea.com). Supported by the FTF and co-organised by DI Dr. Martina Tritthart (Visuelle Kultur), the dual event introduced students of both programmes and visitors alike to contemporary digital art in the era of VR technologies, computer graphics, and generative AI under an interdisciplinary, technology-oriented angle. While the guest talk focused on the recent art installations of Immerea, the workshop allowed participants to explore opportunities in the creation of arts via blender and other common tools firsthand. Both events were well visited, highlighting the drawing power of cutting-edge technology research from a humanities, tech-interested perspective.

 

   

 

Hadi

Title: Token-Wise Attention-Guided Semantic Quality Assessment for Compressed Visual Features

Authors: Shien Ke, Changsheng Gao, Hadi Amirpour, Zhihua Wang,  Xaoyan Sun

Event: QoMEX 2026, Cardiff, UK, June 29th – July 3rd, 2026

Abstract: In collaborative and distributed intelligent systems, compressed intermediate features are routinely transmitted and reused, making semantic quality assessment (SQA) crucial for reliable deployment. Recent compressed feature quality assessment (CFQA) benchmarks, however, show that conventional similarity measures often correlate poorly with downstream semantic utility and lack robustness across diverse feature codecs. In this paper, we propose a token-wise, attention-guided method for assessing the semantic quality of compressed features. First, motivated by the observation that many downstream heads normalize and process tokens largely independently, we assess quality at the token level. This token-wise formulation exploits the intrinsic correspondence between the original and reconstructed tokens while reducing cross-token interference. Second, since tokens contribute unequally to downstream task performance, we adopt an attention-guided aggregation scheme: we derive task-adaptive importance weights from DINOv2 self-attention and use them to pool token-wise quality predictions into a global semantic quality score. Third, to accommodate heterogeneous supervision across tasks, we cast CFQA as a regression problem and rescale classification-based rank targets to mitigate label imbalance. Experiments on the CFQA benchmark demonstrate that our method consistently improves PLCC and SROCC across three tasks and four codecs, yielding a practical, codec-agnostic quality interface for next-generation intelligent systems.

Hadi

Title: Quality-Complexity Trade-off for Sustainable Media Delivery

Authors: Hadi Amirpour, Christian Herglotz, Lingfeng Qu, Wei Zhou, Christian Timmerer

Event: QoMEX 2026, Cardiff, UK, June 29th – July 3rd, 2026

Abstract: Sustainable media delivery increasingly requires joint optimization across perceptual quality, bitrate, and computational cost, yet codec comparisons are often reported only in rate-distortion terms without accounting for energy and (en/de)coding time overheads at scale. This paper analyzes quality–rate–computational cost trade-offs using a large-scale dataset. We first quantify the dominant drivers of bitrate, VMAF, and (en/de)coding user time via interpretable regression models, showing that codec and resolution explain a substantial fraction of the observed variance.  We then characterize local sensitivities of bitrate and (en/de)coding user time to incremental increases in VMAF using interpolation in the quality domain and finite-difference derivatives, providing a content-agnostic view of how much additional bitrate and compute, and consequently energy expenditure, is required per unit of quality improvement. To evaluate practical savings, we compute Bjøntegaard Delta metrics relative to a libx264 reference, revealing that large BD-Rate gains can coincide with substantial penalties in (en/de)coding user time, particularly for most recent video coding standards such as Versatile Video Coding (VVC). Finally, we formulate multi-objective configuration selection as a Binary Linear Program (BLP) that selects one operating point per video by trading perceptual quality against bitrate and (en/de)coding user time; across different weight regimes, the selected codec-resolution-frame-rate distributions shift coherently with system priorities.

Dragi Kimovski Receives FGCS Outstanding Reviewer Award 2025

We are proud to announce that Dragi Kimovski has been selected as a recipient of the 2025 Outstanding Reviewer Award by the Future Generation Computer Systems journal.

Out of more than 5,400 reviewers worldwide, only 31 were chosen for this distinction, making this recognition highly competitive and a testament to exceptional contributions to the peer-review process. The award was given for the dedication, expertise, and commitment to maintaining high scientific standards, which have played an important role in supporting the quality and integrity of published research.

The full list of awardees will be featured in an upcoming open-access editorial in FGCS (Volume 182, September 2026).

Title: EPS: Efficient Patch Sampling for Video Overfitting in Deep Super-Resolution Model Training

Authors: Yiying Wei, Hadi Amirpour, Jong Hwan Ko, and Christian Timmerer

Abstract: Leveraging the overfitting property of deep neural networks (DNNs) is trending in video delivery systems to enhance video quality within bandwidth limits. Existing approaches transmit overfitted super-resolution (SR) model streams for low-resolution (LR) bitstreams, which are used to reconstruct high-resolution (HR) videos at the decoder. Although these approaches show promising results, the huge computational costs of training a large number of video frames limit their practical applications. To overcome this challenge, we propose an efficient patch sampling method named EPS for video SR network overfitting, which identifies the most valuable training patches from video frames.

To this end, we first present two low-complexity Discrete Cosine Transform (DCT)-based spatial-temporal features to measure the complexity score of each patch directly. By analyzing the histogram distribution of these features, we then categorize all possible patches into different clusters and select training patches from the cluster with the highest spatial-temporal information. The number of sampled patches is adaptive based on the video content, addressing the trade-off between training complexity and efficiency.

Our method reduces the number of training patches by 75.00\% to 91.69\%, depending on the resolution and number of clusters, while preserving high video quality and greatly improving training efficiency. Our method speeds up patch sampling by up to 82.1$\times$ compared to the state-of-the-art patch sampling technique (EMT).

Hadi

Title: Perception-Inspired Network for Stereo Image Quality Assessment

Authors: Yongli Chang, Guanghui Yue, Bo Zhao, Li Yu, Yakun Ju,  Hadi Amirpour,  Moncef Gabbouj and Wei Zhou.

Abstract: Existing stereo image quality assessment (SIQA) methods generally have limitations in binocular fusion and fine-grained perception modeling. To address these issues, we propose a Perception-Inspired Network for SIQA that simulates binocular difference-guided fusion, high-frequency sensitivity, and hierarchical perception mechanisms of the human visual system (HVS). First, a difference-guided binocular fusion (DGBF) module is designed to mimic the binocular difference sensitivity mechanism, which exploits difference information at both the feature-level and image-level to optimize binocular fusion. Furthermore, the image distortion primarily affects the high-frequency components, which are critical for perceptual quality. To reflect this, we propose a high-frequency enhancement module (HFEM) to simulate the human eye’s sensitivity to edge and texture distortions. Finally, to better achieve fine-grained perception modeling, we propose a hierarchical quality regression strategy that simulates the human perceptual process, from perceiving local details to forming a global quality judgment, thereby achieving a quality prediction more aligned with human subjective evaluation. Experimental results demonstrate that the proposed method outperforms mainstream approaches, achieving a PLCC of 0.9734 on the LIVE I database, and a PLCC of 0.9632 on the LIVE II database.