Authors: Kurt Horvath (University of Klagenfurt, Austria), Dragi Kimovski (University of Klagenfurt, Austria), Stojan Kitanov (Mother Theresa Universiy Skopje, Macedonia), Radu Prodan (University of Klagenfurt, Austria)

Event: 2025 10th International Conference on Information and Network Technologies (ICINT), March 12-14 2025, Melbourne (Australia)

Abstract: The rapid digitalization of urban infrastructure opens the path to smart cities, where IoT-enabled infrastructure enhances public safety and efficiency. This paper presents a 6G and AI-enabled framework for traffic safety enhancement, focusing on real-time detection and classification of emergency vehicles and leveraging 6G as the latest global communication standard. The system integrates sensor data acquisition, convolutional neural network-based threat detection, and user alert dissemination through various software modules of the use case. We define the latency requirements for such a system, segmenting the end-to-end latency into computational and networking components. Our empirical evaluation demonstrates the impact of vehicle speed and user trajectory on system reliability. The results provide insights for network operators and smart city service providers, emphasizing the critical role of low-latency communication and how networks can enable relevant services for traffic safety.

 

Hadi

CLIP-DQA: Blindly Evaluating Dehazed Images from Global and Local Perspectives Using CLIP

The IEEE International Symposium on Circuits and Systems (IEEE ISCAS 2025)

https://2025.ieee-iscas.org/

25–28 May 2025 // London, United Kingdom

Yirui Zeng (Cardiff University, UK), Jun Fu (Cardiff University), Hadi Amirpour (AAU, Austria), Huasheng Wang (Alibaba Group), Guanghui Yue (Shenzhen University, China), Hantao Liu (Cardiff University), Ying Chen (Alibaba Group), Wei Zhou (Cardiff University)

Abstract: Blind dehazed image quality assessment (BDQA), which aims to accurately predict the visual quality of dehazed images without any reference information, is essential for the evaluation, comparison, and optimization of image dehazing algorithms. Existing learning-based BDQA methods have achieved remarkable success, while the small scale of DQA datasets limits their performance. To address this issue, in this paper, we propose to adapt Contrastive Language-Image Pre-Training (CLIP), pre-trained on large-scale image-text pairs, to the BDQA task. Specifically, inspired by the fact that the human visual system understands images based on hierarchical features, we take global and local information of the dehazed image as the input of CLIP. To accurately map the input hierarchical information of dehazed images into the quality score, we tune both the vision branch and language branch of CLIP with prompt learning. Experimental results on two authentic DQA datasets demonstrate that our proposed approach, named CLIP-DQA, achieves more accurate quality predictions over existing BDQA methods.

Multi-resolution Encoding for HTTP Adaptive Streaming using VVenC

The IEEE International Symposium on Circuits and Systems (IEEE ISCAS 2025)

https://2025.ieee-iscas.org/

25–28 May 2025 // London, United Kingdom

Kamran Qureshi (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: HTTP Adaptive Streaming (HAS) is a widely adopted method for delivering video content over the Internet, requiring each video to be encoded at multiple bitrates and resolution pairs, known as representations, to adapt to various network conditions and device capabilities. This multi-bitrate encoding introduces significant challenges due to the computational and time-intensive nature of encoding multiple representations. Conventional approaches often encode these videos independently without leveraging similarities between different representations of the same input video. This paper proposes an accelerated multi-resolution encoding strategy that utilizes representations of lower resolutions as references to speed up the encoding of higher resolutions when using Versatile Video Coding (VVC); specifically in VVenC, an optimized open-source software implementation. For multi-resolution encoding, a mid-bitrate representation serves as the reference, allowing interpolated encoded partition data to efficiently guide the partitioning process in higher resolutions. The proposed approach uses shared encoding information to reduce redundant calculations, thereby optimizing the partitioning decisions. Experimental results demonstrate that the proposed technique achieves a reduction of up to 17% compared to medium preset in encoding time across videos of varying complexities with minimal BDBR/BDT of 0.12 compared to the fast preset.

 

Improving the Efficiency of VVC using Partitioning of Reference Frames

The IEEE International Symposium on Circuits and Systems (IEEE ISCAS 2025)

https://2025.ieee-iscas.org/

25–28 May 2025 // London, United Kingdom

Kamran Qureshi (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: In response to the growing demand for high-quality videos, a new coding standard, Versatile Video Coding (VVC), was released in 2020. VVC is based on the same hybrid coding architecture as its predecessor, High-Efficiency Video Coding (HEVC), providing a bitrate reduction of approximately 50% for the same subjective quality. VVC extends HEVC’s Coding Tree Unit (CTU) partitioning with more flexible block sizes, increasing its encoding complexity. Optimization is essential to making efficient use of VVC in practical applications. VVenC, an optimized open-source VVC encoder, introduces multiple presets to address the trade-off between compression efficiency and encoder complexity. Although an optimized set of encoding tools has been selected for each preset, the rate-distortion (RD) search space in the encoder presets still poses a challenge for efficient encoder implementations. This paper proposes Early Termination using Reference Frames (ETRF). It improves the trade-off between encoding efficiency and time complexity and positions itself as a new preset between medium and fast presets. The CTU partitioning map of the reference frames present in lower temporal layers is employed to accelerate the encoding of frames in higher temporal layers. The results show a reduction in the encoding time of around 22% compared to the medium preset. Specifically, for videos with high spatial and temporal complexities, which typically require longer encoding times, the proposed method shows an improved BDBR/BDT compared to the fast preset.

 

Authors: Farzad Tashtarian (Alpen-Adria Universität Klagenfurt, Austria); Mahdi Dolati (Sharif University of Technology, Iran); Daniele Lorenzi (University of Klagenfurt, Austria); Mojtaba Mozhganfar (University of Tehran, Iran); Sergey Gorinsky (IMDEA Networks Institute, Spain); Ahmad Khonsari (University of Tehran, Iran); Christian Timmerer (Alpen-Adria-Universität Klagenfurt & Bitmovin, Austria); Hermann Hellwagner (Klagenfurt University, Austria)

Event: IEEE INFOCOM 2025,  19–22 May 2025 // London, United Kingdom

Abstract: Live streaming routinely relies on the Hypertext Transfer Protocol (HTTP) and content delivery networks (CDNs) to scalably disseminate videos to diverse clients. A bitrate ladder refers to a list of bitrate-resolution pairs, or representations, used for encoding a video. A promising trend in HTTP-based video streaming is to adapt not only the client’s representation choice but also the bitrate ladder during the streaming session. This paper examines the problem of multi-live streaming, where an encoding service performs coordinated CDN-aware bitrate ladder adaptation for multiple live streams delivered to heterogeneous clients in different zones via CDN edge servers. We design ALPHAS, a practical and scalable system for multi-live streaming that accounts for CDNs’ bandwidth constraints and encoder’s computational capabilities and also supports stream prioritization. ALPHAS, aware of both video content and streaming context, seamlessly integrates with the end-to-end streaming pipeline and operates in real time transparently to clients and encoding algorithms. We develop a cloud-based ALPHAS implementation and evaluate it through extensive real-world and trace-driven experiments against four prominent baselines that encode each stream independently. The evaluation shows that ALPHAS outperforms the baselines, improving quality of experience, end-to-end latency, and per-stream processing by up to 23%, 21%, and 49%, respectively.

 

Authors: Emanuele Artioli (Alpen-Adria Universität Klagenfurt, Austria), Daniele Lorenzi (Alpen-Adria Universität Klagenfurt, Austria), Farzad Tashtarian (Alpen-Adria Universität Klagenfurt, Austria), Christian Timmerer (Alpen-Adria Universität Klagenfurt, Austria)

Event: ACM 4th Mile-High Video Conference (MHV’25), 18–20 February 2025 |
Denver, CO, USA

Abstract: The demand for accessible, multilingual video content has grown significantly with the global rise of streaming platforms, social media, and online learning. The traditional solutions for making content accessible across languages include subtitles, even generated ones, as YouTube offers, and synthesizing voiceovers, offered, for example, by the Yandex Browser. Subtitles are cost-effective and reflect the original voice of the speaker, which is often essential for authenticity. However, they require viewers to divide their attention between reading text and watching visuals, which can diminish engagement, especially for highly visual content. Synthesized voiceovers, on the other hand, eliminate this need by providing an auditory translation. Still, they typically lack the emotional depth and unique vocal characteristics of the original speaker, which can affect the viewing experience and disconnect audiences from the intended pathos of the content. A straightforward solution would involve having the original actor “perform” in every language, thereby preserving the traits that define their character or narration style. However, recording actors in multiple languages is impractical, time-intensive, and expensive, especially for widely distributed media.

By leveraging generative AI, we aim to develop a client-side tool, to incorporate in a dedicated video streaming player, that combines the accessibility of multilingual dubbing with the authenticity of the original speaker’s performance, effectively allowing a single actor to deliver their voice in any language. To the best of our knowledge, no current streaming system can capture the speaker’s unique voice or emotional tone.

Authors: Daniele Lorenzi (Alpen-Adria Universität Klagenfurt, Austria), Farzad Tashtarian (Alpen-Adria Universität Klagenfurt, Austria), Christian Timmerer (Alpen-Adria Universität Klagenfurt, Austria)

Event: ACM 4th Mile-High Video Conference (MHV’25), 18–20 February 2025 |
Denver, CO, USA

Abstract: HTTP Adaptive Streaming (HAS) dominates video delivery but faces sustainability issues due to its energy demands. Current adaptive bitrate (ABR) algorithms prioritize quality, neglecting the energy costs of higher bitrates. Super-resolution (SR) can enhance quality but increases energy use, especially for GPU-equipped devices in competitive networks. RecABR addresses these challenges by clustering clients based on device attributes (e.g., GPU, resolution) and optimizing parameters via linear programming. This reduces computational overhead and ensures energy-efficient, quality-aware recommendations. Using metrics like VMAF and compressed SR models, RecABR minimizes storage and processing costs, making it scalable for CDN edge deployment.

Magdalena participated in the “Game Over?”- Conference from November 14th to 16th with its topic “Dystopia x Utopia x Video Games.” Together with Iris van der Horst (MEd), she presented the talk titled “From Oppression to Liberation: Postcolonial Perspectives on the Dystopian World of Xenoblade Chronicles 3” focused on how the game displays a critical dystopia variant 1 and how it criticizes colonial structures by magnifying the abstract concepts of third space and contact zone and depicting them in a very concrete and creative way. Therefore, this presentation combined theories on utopianism and postcolonialism.

Authors: Narges Mehran, Zahra Najafabadi Samani, Samira Afzal, Radu Prodan, Frank Pallas and Peter Dorfinger

Event: The 40th ACM/SIGAPP Symposium On Applied Computing https://www.sigapp.org/sac/sac2025/

Abstract:

The popularity of asynchronous data exchange patterns has recently increased, as evidenced by an Alibaba trace analysis showing that 23% of the communication between microservices uses this method. Such workloads necessitate exploring a method for reducing their dataflow processing and completion time. Moreover, there is a need to exploit a prediction method to forecast the future requirements of such microservices and (re-)schedule them. Therefore, we investigate the prediction-based scheduling of asynchronous dataflow processing applications by considering the stochastic changes due to dynamic user requirements.

Moreover, we present a microservice scaling and scheduling method named PreMatch combining a machine learning prediction strategy based on gradient boosting with ranking and game theory matching scheduling principles. Firstly, PreMatch predicts the number of microservice replicas, and then, the ranking method orders the microservice replica and devices based on microservice and transmission times. Thereafter, the PreMatch schedules microservice replicas requiring dataflow processing on computing devices. Experimental analysis of the PreMatch method shows lower completion times on average 13% compared to a related prediction-based scheduling method.

Authors: Tom Tucek, Kseniia Harshina, Georgia Samaritaki (University of Amsterdam), and Dipika Rajesh (University of California, Santa Cruz)

Abstract:
This paper presents “One Spell Fits All”, an AI-native game prototype where the player, playing as a witch, solves villagers’ problems using magical conjurations. We show how, beyond being a standalone game, “One Spell Fits All” could serve as a research platform to explore several key areas in AI-driven and AI-native game design. These areas include AI creativity, user experience in predominantly AI-generated content, and the energy efficiency of locally running versus cloud-based AI models. By leveraging smaller, locally running generative AI models, including LLMs and diffusion models for image generation, the game dynamically generates and evaluates content without the need for external APIs or internet access, offering a sustainable and responsive gameplay experience. This paper explores the application of LLMs in narrative video games, outlines a game prototype’s design and mechanics, and proposes future research opportunities that can be explored using the game as a platform.

EXAG ’24: Experimental AI in Games Workshop at the AIIDE Conference, November 18, 2024, Lexington, USA