Publication – ITEC Homepage

Real-Time AI-Driven Avatar Generation for Sign Language in HTTP Adaptive Streaming

The 3rd ACM SIGCOMM Workshop on Emerging Multimedia Systems (ACM EMS 2025)

https://conferences.sigcomm.org/sigcomm/2025/workshop/ems/

8 September 2025 // Coimbra, Portugal

Daniele Lorenzi (AAU, Austria), Emanuele Artioli (AAU, Austria), Farzad Tashtarian (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: As digital media consumption over the Internet surges globally, ensuring accessibility for all users becomes paramount. For people with hearing impairments, this means providing inclusion beyond classic captioning, which does not convey the full emotional and contextual depth of spoken content. This work addresses this accessibility gap by exploring the use of AI-generated avatars capable of translating speech into sign language in real-time. After defining the multifaceted challenges in this domain, we propose a novel AI-driven task partition to animate avatars for accurate and expressive sign language interpretations in live streaming.

June 26, 2025

Publication

Elsevier Displays: Unlocking Implicit Motion for Evaluating Image Complexity

Unlocking Implicit Motion for Evaluating Image Complexity
Displays

Yixiao Lia (Beihang University, China), Xiaoyuan Yang (Beihang University, China), Yanda Meng (University of Exeter, UK), Hadi Amirpour (AAU, AT), Jiang Liu (Cardiff University, UK), Yuqing Luo (Cardiff University, UK), Hantao Liu (Cardiff University, UK), and Wei Zhou (Cardiff University, UK)

Abstract: Image complexity (IC) plays a critical role in both cognitive science and multimedia computing, influencing visual aesthetics, emotional responses, and tasks such as image classification and enhancement. However, defining and quantifying IC remains challenging due to its multifaceted nature, which encompasses both objective attributes (e.g., detail, structure) and subjective human perception. While traditional methods rely on entropy-based or multidimensional approaches, and recent advances employ machine learning and shallow neural networks, these techniques often fail to fully capture the subjective aspects of IC. Inspired by the fact that the human visual system inherently perceives implicit motion in static images, we propose a novel approach to address this gap by explicitly incorporating hidden motion into IC assessment. We introduce the motion-inspired image complexity assessment metric (MICM) as a new framework for this purpose. MICM introduces a dual-branch architecture: One branch extracts spatial features from static images, while the other generates short video sequences to analyze latent motion dynamics. To ensure meaningful motion representation, we design a hierarchical loss function that aligns video features with text prompts derived from image-to-text models, refining motion semantics at both local (i.e., frame and word) and global levels. Experiments on three public image complexity assessment (ICA) databases demonstrate that our approach, MICM, significantly outperforms state-of-the-art methods, validating its effectiveness. The code will be publicly available upon acceptance of the paper.

June 23, 2025

MMC, Publication

Journal paper accepted – Elsevier SPIC: 360-Degree Video Super Resolution and Quality Enhancement Challenge: Methods and Results

Authors: Ahmed Telili (TII, UAE), Wassim Hamidouche (TII, UAE), Brahim Farhat (TII, UAE), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria), Ibrahim Khadraoui (TII, UAE), Jiajie Lu (Politecnico di Milano, Italy), The Van Le (IVCL, South Korea), Jeonneung Baek (IVCL, South Korea), Jin Young Lee (IVCL, South Korea), Yiying Wei (AAU, Austria), Xiaopeng Sun (Meituan Inc. China), Yu Gao (Meituan Inc. China), JianCheng Huang (Meituan Inc. China) and Yujie Zhong (Meituan Inc. China)

Journal: Signal Processing: Image Communication

Abstract: Omnidirectional (360-degree) video is rapidly gaining popularity due to advancements in immersive technologies like virtual reality (VR) and extended reality (XR). However, real-time streaming of such videos, particularly in live mobile scenarios such as unmanned aerial vehicles (UAVs), is hindered by limited bandwidth and strict latency constraints. While traditional methods such as compression and adaptive resolution are helpful, they often compromise video quality and introduce artifacts that diminish the viewer’s experience. Additionally, the unique spherical geometry of 360-degree video, with its wide field of view, presents challenges not encountered in traditional 2D video. To address these challenges, we initiated the 360-degree Video Super Resolution and Quality Enhancement challenge. This competition encourages participants to develop efficient machine learning (ML)-powered solutions to enhance the quality of low-bitrate compressed 360-degree videos, under two tracks focusing on 2× and 4× super-resolution (SR). In this paper, we outline the challenge framework, detailing the two competition tracks and highlighting the SR solutions proposed by the top-performing models. We assess these models within a unified framework, (i) considering quality enhancement, (ii) bitrate gain, and (iii) computational efficiency. Our findings show that lightweight single-frame models can effectively balance visual quality and runtime performance under constrained conditions, setting strong baselines for future research. These insights offer practical guidance for advancing real-time 360-degree video streaming, particularly in bandwidth-limited immersive applications.

June 10, 2025

Announcement, Publication

Paper accepted: SCAREY: Location-Aware Service Lifecycle Management

Authors: Kurt Horvath, Dragi Kimovski, Radu Prodan

Venue: 2025 IEEE International Conference on Edge Computing and Communications (IEEE EDGE 2025), July 7-12 Helsinki, Finland

Abstract: Scheduling services within the computing continuum is complex due to the dynamic interplay of the Edge, Fog, and Cloud resources, each offering distinct computational and networking advantages. This paper introduces SCAREY, a user location-aided service lifecycle management framework based on state machines. SCAREY addresses critical service discovery, provisioning, placement, and monitoring challenges by providing unified dynamic state machine-based lifecycle management, allowing instances to transition between discoverable and non-discoverable states based on demand. It incorporates a scalable service deployment algorithm to adjust the number of instances and employs network measurements to optimize service placement, ensuring minimal latency and enhancing sustainability. Real-world evaluations demonstrate a 73% improvement in service discovery and acquisition times, 45% cheaper operating costs and over 57% lesser power consumption and lower CO2 emissions compared to existing related methods.

May 19, 2025

Announcement, Publication

Paper accepted: EnergyLess: An Energy-Aware Serverless Workflow Batch Orchestration on the Computing Continuum

We are happy to announce that our paper “EnergyLess: An Energy-Aware Serverless Workflow Batch Orchestration on the Computing Continuum” (by Reza Farahani and Radu Prodan) has been accepted for IEEE CLOUD 2025, which will take place in Helsinki, Finland, in July 2025.

Venue: IEEE International Conference on Cloud Computing 2025 (IEEE CLOUD 2025)

Abstract: Serverless cloud computing is increasingly adopted for workflow management, optimizing resource utilization for providers while lowering costs for customers. Integrating edge computing into this paradigm enhances scalability and efficiency, enabling seamless workflow distribution across geographically dispersed resources on the computing continuum. However, existing serverless workflow orchestration methods on the computing continuum often prioritize time and cost objectives, neglecting energy consumption and carbon footprint. This paper introduces EnergyLess, a multi-objective concurrent serverless workflow batch orchestration service for the computing continuum. EnergyLess decomposes workflow functions within a batch into finer-grained sub-functions and schedules either the original or sub-function versions to appropriate regions and instances on the continuum, improving energy consumption, carbon footprint, economic cost, and completion time while considering individual workflow requirements and resource constraints. We formulate the problem as a mixed-integer nonlinear programming (MINLP) model and propose three lightweight heuristic algorithms for function decomposition and scheduling. Evaluations on a large- scale computing continuum testbed with realistic workflows, spanning AWS Lambda, Google Cloud Functions (GCF), and 325 fog and edge instances across six regions demonstrate that EnergyLess improves cost efficiency by 75 %, completion time by 6%, energy consumption by 15%, and CO2 emissions by 20% for a batch size of 300, compared to three baseline methods.

May 13, 2025

Announcement, Publication

Paper accepted: Osmotic Learning: A Self-Supervised Paradigm for Decentralized Contextual Data Representation

Authors: Mario Colosi (University of Messina, Italy), Reza Farahani (University of Klagenfurt, Austria), Maria Fazio (University of Messina, Italy), Radu Prodan (University of Innsbruck, Austria), Massimo Villari (University of Messina, Italy)

Venue: International Joint Conference on Neural Networks (IJCNN), 30 June – 5 July 2025, Rome, Italy

Abstract: Data within a specific context gains deeper significance beyond its isolated interpretation. In distributed systems, interdependent data sources reveal hidden relationships and latent structures, representing valuable information for many applications. This paper introduces Osmotic Learning (OSM-L), a self-supervised distributed learning paradigm designed to uncover higher-level latent knowledge from distributed data. The core of OSM-L is osmosis, a process that synthesizes dense and compact representation by extracting contextual information, eliminating the need for raw data exchange between distributed entities. OSM-L iteratively aligns local data representations, enabling information diffusion and convergence into a dynamic equilibrium that captures contextual patterns. During training, it also identifies correlated data groups, functioning as a decentralized clustering mechanism. Experimental results confirm OSM-L’s convergence and representation capabilities on structured datasets, achieving over 0.99 accuracy in local information alignment while preserving contextual integrity.

April 1, 2025

Announcement, Project, Publication

Journal article accepted: ACM TOMM: HTTP Adaptive Streaming: A Review on Current Advances and Future Challenges

ACM Transactions on Multimedia Computing, Communications, and Applications

Christian Timmerer (AAU, AT), Hadi Amirpour (AAU, AT), Farzad Tashtarian (AAU, AT), Samira Afzal (AAU, AT), Amr Rizk (Leibniz University Hannover, DE), Michael Zink (University of Massachusetts Amherst, US), and Hermann Hellwagner (AAU, AT)

Abstract: Video streaming has evolved from push-based, broad-/multicasting approaches with dedicated hard-/software infrastructures to pull-based unicast schemes utilizing existing Web-based infrastructure to allow for better scalability. In this article, we provide an overview of the foundational principles of HTTP adaptive streaming (HAS), from video encoding to end user consumption, while focusing on the key advancements in adaptive bitrate algorithms, quality of experience (QoE), and energy efficiency. Furthermore, the article highlights the ongoing challenges of optimizing network infrastructure, minimizing latency, and managing the environmental impact of video streaming. Finally, future directions for HAS, including immersive media streaming and neural network-based video codecs, are discussed, positioning HAS at the forefront of next-generation video delivery technologies.

Keywords: HTTP Adaptive Streaming, HAS, DASH, Video Coding, Video Delivery, Video Consumption, Quality of Experience, QoE

https://athena.itec.aau.at/2025/03/acm-tomm-http-adaptive-streaming-a-review-on-current-advances-and-future-challenges/

March 24, 2025

Announcement, Publication

Paper accepted: ICME 2025: Neural Representations for Scalable Video Coding

Neural Representations for Scalable Video Coding

IEEE International Conference on Multimedia & Expo (ICME) 2025

Authors: Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria), and Christian Timmerer (AAU, Austria)

Abstract: Scalable video coding encodes a video stream into multiple layers so that it can be decoded at different levels of quality/resolution, depending on the device’s capabilities or the available network bandwidth. Recent advances in implicit neural representation (INR)-based video codecs have shown competitive compression performance to both traditional and other learning-based methods. In INR approaches, a neural network is trained to overfit a video sequence, and its parameters are compressed to create a compact representation of the video content. While they achieve promising results, existing INR-based codecs require training separate networks for each resolution/quality of a video, making them challenging for scalable compression. In this paper, we propose Neural representations for Scalable Video Coding (NSVC) that encodes multi-resolution/-quality videos into a single neural network comprising multiple layers. The base layer (BL) of the neural network encodes video streams with the lowest resolution/quality. Enhancement layers (ELs) encode additional information that can be used to reconstruct a higher resolution/quality video during decoding using the BL as a starting point. This multi-layered structure allows the scalable bitstream to be truncated to adapt to the client’s bandwidth conditions or computational decoding requirements. Experimental results show that NSVC outperforms AVC’s Scalable Video Coding (SVC) extension and surpasses HEVC’s scalable extension (SHVC) in terms of VMAF. Additionally, NSVC achieves comparable decoding speeds at high resolutions/qualities.

ICME 2025: Neural Representations for Scalable Video Coding | ATHENA Christian Doppler (CD) Laboratory

March 21, 2025

Announcement, Publication

Journal article accepted: VQM4HAS: A Real-time Quality Metric for HEVC Videos in HTTP Adaptive Streaming

We are glad that the paper was accepted for publication in IEEE Transactions on Multimedia.

Authors: Hadi Amirpour (AAU, AT), Jingwen Zhu (Nantes University, FR), Wei Zhu (Cardiff University, UK), Patrick Le Callet (Nantes University, FR), and Christian Timmerer (AAU, AT)

Abstract: In HTTP Adaptive Streaming (HAS), a video is encoded at various bitrate-resolution pairs, collectively known as the bitrate ladder, allowing users to select the most suitable representation based on their network conditions. Optimizing this set of pairs to enhance the Quality of Experience (QoE) requires accurately measuring the quality of these representations. VMAF and ITU-T’s P.1204.3 are highly reliable metrics for assessing the quality of representations in HAS. However, in practice, using these metrics for optimization is often impractical for live streaming applications due to their high computational costs and the large number of bitrate-resolution pairs in the bitrate ladder that need to be evaluated. To address their high complexity, our paper introduces a new method called VQM4HAS, which extracts low-complexity features including (i) video complexity features, (ii) frame-level encoding statistics logged during the encoding process, and (iii) lightweight video quality metrics. These extracted features are then fed into a regression model to predict VMAF and P.1204.3, respectively.

The VQM4HAS model is designed to operate on a per bitrate-resolution pair, per-resolution, and cross-representation basis, optimizing quality predictions across different HAS scenarios. Our experimental results demonstrate that VQM4HAS achieves a high correlation with VMAF and P.1204.3, with Pearson correlation coefficients (PCC) ranging from 0.95 to 0.96 for VMAF and 0.97 to 0.99 for P.1204.3, depending on the resolution. Despite achieving a high correlation with VMAF and P.1204.3, VQM4HAS exhibits significantly less complexity than both metrics, with 98% and 99% less complexity for VMAF and P.1204.3, respectively, making it suitable for live streaming scenarios.
We also conduct a feature importance analysis to further reduce the complexity of the proposed method. Furthermore, we evaluate the effectiveness of our method by using it to predict subjective quality scores. The results show that VQM4HAS achieves a higher correlation with subjective scores at various resolutions, despite its minimal complexity.

March 17, 2025

Announcement, Publication

Papers accepted @ Intel4EC Workshop 2025

The following papers have been accepted at the Intel4EC Workshop 2025 which will be held on June 4, 2025 in Milan, Italy in conjunction with 39th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2025)

Title: 6G Infrastructures for Edge AI: An Analytical Perspective

Authors: Kurt Horvath, Shpresa Tuda*, Blerta Idrizi*, Stojan Kitanov*, Fisnik Doko*, Dragi Kimovski (*Mother Teresa University Skopje, North Macedonia)

Abstract: The convergence of Artificial Intelligence (AI) and the Internet of Things has accelerated the development of distributed, network-sensitive applications, necessitating ultra-low latency, high throughput, and real-time processing capabilities. While 5G networks represent a significant technological milestone, their ability to support AI-driven edge applications remains constrained by performance gaps observed in real-world deployments. This paper addresses these limitations and highlights critical advancements needed to realize a robust and scalable 6G ecosystem optimized for AI applications. Furthermore, we conduct an empirical evaluation of 5G network infrastructure in central Europe, with latency measurements ranging from 61 ms to 110 ms across different close geographical areas. These values exceed the requirements of latency-critical AI applications by approximately 270%, revealing significant shortcomings in current deployments. Building on these findings, we propose a set of recommendations to bridge the gap between existing 5G performance and the requirements of next-generation AI applications.

Title: Blockchain consensus mechanisms for democratic voting environments

Authors: Thomas Auer, Kurt Horvath, Dragi Kimovski

Abstract: Democracy relies on robust voting systems to ensure transparency, fairness, and trust in electoral processes. Despite its foundational role, voting mechanisms – both manual and electronic – remain vulnerable to threats such as vote manipulation, data loss, and administrative interference. These vulnerabilities highlight the need for secure, scalable, and cost-efficient alternatives to safeguard electoral integrity. The fully decentralized voting system leverages blockchain technology to overcome critical challenges in modern voting systems, including scalability, cost-efficiency, and transaction throughput. By eliminating the need for a centralized authority, the system ensures transparency, security, and real-time monitoring by integrating Distributed Ledger Technologies. This novel architecture reduces operational costs, enhances voter anonymity, and improves scalability, achieving significantly lower costs for 1,000 votes than traditional voting methods.

The system introduces a formalized decentralized voting model that adheres to constitutional requirements and practical standards, making it suitable for implementation in direct and representative democracies. Additionally, the design accommodates high transaction volumes without compromising performance, ensuring reliable operation even in large-scale elections. The results demonstrate that this system outperforms classical approaches regarding efficiency, security, and affordability, paving the way for broader adoption of blockchain-based voting solutions.

March 17, 2025