Authors: Kurt Horvath (University of Klagenfurt, Austria), Dragi Kimovski (University of Klagenfurt, Austria), Stojan Kitanov (Mother Theresa Universiy Skopje, Macedonia), Radu Prodan (University of Klagenfurt, Austria)

Event: 2025 10th International Conference on Information and Network Technologies (ICINT), March 12-14 2025, Melbourne (Australia)

Abstract: The rapid digitalization of urban infrastructure opens the path to smart cities, where IoT-enabled infrastructure enhances public safety and efficiency. This paper presents a 6G and AI-enabled framework for traffic safety enhancement, focusing on real-time detection and classification of emergency vehicles and leveraging 6G as the latest global communication standard. The system integrates sensor data acquisition, convolutional neural network-based threat detection, and user alert dissemination through various software modules of the use case. We define the latency requirements for such a system, segmenting the end-to-end latency into computational and networking components. Our empirical evaluation demonstrates the impact of vehicle speed and user trajectory on system reliability. The results provide insights for network operators and smart city service providers, emphasizing the critical role of low-latency communication and how networks can enable relevant services for traffic safety.

 

Hadi

CLIP-DQA: Blindly Evaluating Dehazed Images from Global and Local Perspectives Using CLIP

The IEEE International Symposium on Circuits and Systems (IEEE ISCAS 2025)

https://2025.ieee-iscas.org/

25–28 May 2025 // London, United Kingdom

Yirui Zeng (Cardiff University, UK), Jun Fu (Cardiff University), Hadi Amirpour (AAU, Austria), Huasheng Wang (Alibaba Group), Guanghui Yue (Shenzhen University, China), Hantao Liu (Cardiff University), Ying Chen (Alibaba Group), Wei Zhou (Cardiff University)

Abstract: Blind dehazed image quality assessment (BDQA), which aims to accurately predict the visual quality of dehazed images without any reference information, is essential for the evaluation, comparison, and optimization of image dehazing algorithms. Existing learning-based BDQA methods have achieved remarkable success, while the small scale of DQA datasets limits their performance. To address this issue, in this paper, we propose to adapt Contrastive Language-Image Pre-Training (CLIP), pre-trained on large-scale image-text pairs, to the BDQA task. Specifically, inspired by the fact that the human visual system understands images based on hierarchical features, we take global and local information of the dehazed image as the input of CLIP. To accurately map the input hierarchical information of dehazed images into the quality score, we tune both the vision branch and language branch of CLIP with prompt learning. Experimental results on two authentic DQA datasets demonstrate that our proposed approach, named CLIP-DQA, achieves more accurate quality predictions over existing BDQA methods.

Multi-resolution Encoding for HTTP Adaptive Streaming using VVenC

The IEEE International Symposium on Circuits and Systems (IEEE ISCAS 2025)

https://2025.ieee-iscas.org/

25–28 May 2025 // London, United Kingdom

Kamran Qureshi (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: HTTP Adaptive Streaming (HAS) is a widely adopted method for delivering video content over the Internet, requiring each video to be encoded at multiple bitrates and resolution pairs, known as representations, to adapt to various network conditions and device capabilities. This multi-bitrate encoding introduces significant challenges due to the computational and time-intensive nature of encoding multiple representations. Conventional approaches often encode these videos independently without leveraging similarities between different representations of the same input video. This paper proposes an accelerated multi-resolution encoding strategy that utilizes representations of lower resolutions as references to speed up the encoding of higher resolutions when using Versatile Video Coding (VVC); specifically in VVenC, an optimized open-source software implementation. For multi-resolution encoding, a mid-bitrate representation serves as the reference, allowing interpolated encoded partition data to efficiently guide the partitioning process in higher resolutions. The proposed approach uses shared encoding information to reduce redundant calculations, thereby optimizing the partitioning decisions. Experimental results demonstrate that the proposed technique achieves a reduction of up to 17% compared to medium preset in encoding time across videos of varying complexities with minimal BDBR/BDT of 0.12 compared to the fast preset.

 

Improving the Efficiency of VVC using Partitioning of Reference Frames

The IEEE International Symposium on Circuits and Systems (IEEE ISCAS 2025)

https://2025.ieee-iscas.org/

25–28 May 2025 // London, United Kingdom

Kamran Qureshi (AAU, Austria), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: In response to the growing demand for high-quality videos, a new coding standard, Versatile Video Coding (VVC), was released in 2020. VVC is based on the same hybrid coding architecture as its predecessor, High-Efficiency Video Coding (HEVC), providing a bitrate reduction of approximately 50% for the same subjective quality. VVC extends HEVC’s Coding Tree Unit (CTU) partitioning with more flexible block sizes, increasing its encoding complexity. Optimization is essential to making efficient use of VVC in practical applications. VVenC, an optimized open-source VVC encoder, introduces multiple presets to address the trade-off between compression efficiency and encoder complexity. Although an optimized set of encoding tools has been selected for each preset, the rate-distortion (RD) search space in the encoder presets still poses a challenge for efficient encoder implementations. This paper proposes Early Termination using Reference Frames (ETRF). It improves the trade-off between encoding efficiency and time complexity and positions itself as a new preset between medium and fast presets. The CTU partitioning map of the reference frames present in lower temporal layers is employed to accelerate the encoding of frames in higher temporal layers. The results show a reduction in the encoding time of around 22% compared to the medium preset. Specifically, for videos with high spatial and temporal complexities, which typically require longer encoding times, the proposed method shows an improved BDBR/BDT compared to the fast preset.

 

On 16 January 2024, a special PhD presentation took place as a part of the ongoing open learning space organised by the master’s programme Game Studies and Engineering. Ria Sommer MA from the University of Osnabrück presented her ongoing research on Actual Play and Let’s Play practices. The event was moderated by Dr Felix Schniz BA MA and visited by students from various disciplines and AAU lecturers.

On January 8th, 2025, the 14th Video Browser Showdown (VBS) took place at the International Conference on Multimedia Modeling (MMM2025), in Nara, Japan. 17 international teams competed against each other over about five hours and solved very challenging tasks. VBS is the largest international video retrieval competition and teams had to perform search in a dataset with a total of 31.715 video files and 5.930 hours, coming from the V3C, LHE, and MVK video collections. This year, in addition to the typical task types, such as KIS-V/KIS-T (known-item search with visual and textual hints), AVS (ad-hoc video search) and QA (interactive question answering), we had a new category called KIS-C (conversational known-item search), where an oracle of a few people mentioned hints and answered questions to the teams about the scene of interest. Overall, VBS2025 was a challenging and fascinating event, with a lot of fun, exciting queries, and great video retrieval systems.

We would like to thank the 17 teams with 95 unique authors from 13 countries for their participation and congratulate them to the great success at VBS2025. Congratulations in particular go to
– the NII-UIT team, who was the overall winner and provided the best expert system
– the VEAGLE team, who provided the best novice system

Also ITEC’s video retrieval system, diveXplore, performed very well and achieved the 3rd place in the overall scoring.

VBS is a huge event that requires a lot of organization efforts.
Many thanks go to Werner Bailer for creating and checking all the queries in advance, to Cathal Gurrin for doing an amazing job at moderation over 5 hours, to all the live judges (Dimitris Georgalis, Gylfi Þór Guðmundsson, Björn Þór Jónsson, Andreas Leibetseder, Laura Rettig, Heiko Schuldt, Dimitris Stefanopoulos) for assessing submissions that could not be evaluated automatically, the oracle members (Björn Þór Jónsson, Stevan Rudinac, Heiko Schuldt) for exciting query hints, the developers of the distributed retrieval evaluation servery (DRES) (Luca Rossetto, Loris Sauter, Ralph Gasser), the creators of the datasets (particularly Luca Rossetto and Sai-Kit Yeung), and all the local organizers of MMM2025 (Ichiro Ide, Ioannis Kompatsiaris, ChangSheng Xu, Keiji Yanai, Chong-Wah Ngo, Shin’ichi Satoh, Marc Kastner, and others). THANK YOU ALL SO MUCH!

On 14 January, Dr Felix Schniz and GSE Student Peter Miklautz organised the “Magic Academy.” The event took place in the library’s Unruhezone and introduced GSE and other students to the concept of Trading Card Games. It served as an introductory event for a series of follow-up “Magic Academies”, which are scheduled for the end of this semester and over the course of next semester. The series concludes in June with a three-day event on academic perspectives on Trading Card Games, featuring international guest speakers, workshops on the mechanics, stochastics, and analytical observations on different card game systems, and an analogue game design mini-game jam.

With over 30 students attending, this kick-off was a magical success!

The paper “Two-pass Encoding for Live Video Streaming” has been selected as the Best Student Paper at the NAB Broadcast Engineering and IT (BEIT) Conference 2025.

NAB Broadcast Engineering and IT (BEIT) Conference

5–9 April 2025 | Las Vegas, NV, USA

Abstract: Live streaming has become increasingly important in our daily lives due to the growing demand for real-time content consumption. Traditional live video streaming typically relies on single-pass encoding due to its low latency. However, it lacks video content analysis, often resulting in inefficient compression and quality fluctuations during playback. Constant Rate Factor (CRF) encoding, a type of single-pass method, offers more consistent quality but suffers from unpredictable output bitrate, complicating bandwidth management. In contrast, multi-pass encoding improves compression efficiency through multiple passes. However, its added latency makes it unsuitable for live streaming. In this paper, we propose OTPS, an online two-pass encoding scheme that overcomes these limitations by employing fast feature extraction on a downscaled video representation and a gradient-boosting regression model to predict the optimal CRF for encoding. This approach provides consistent quality and efficient encoding while avoiding the latency introduced by traditional multi-pass techniques. Experimental results show that OTPS offers 3.7% higher compression efficiency than single-pass encoding and achieves up to 28.1% faster encoding than multi-pass modes. Compared to single-pass encoding, encoded videos using OTPS exhibit 5% less deviation from the target bitrate while delivering notably more consistent quality.

Authors: Mohammad Ghasempour (AAU, Austria); Hadi Amirpour (AAU, Austria); Christian Timmerer (AAU, Austria)

 

June 30 to July 4, 2025, Nantes, France

Delivering video content from a video server to viewers over the Internet is time-consuming in the streaming workflow and has to be handled to offer an uninterrupted streaming experience. The end-to-end latency, i.e., from the camera capture to the user device, is particularly problematic for live streaming. Some streaming-based applications, such as virtual events, esports, online learning, gaming, webinars, and all-hands meetings, require low latency for their operation. Video streaming is ubiquitous in many applications, devices, and fields. Delivering high Quality-of-Experience (QoE) to the streaming viewers is crucial, while the requirement to process a large amount of data to satisfy such QoE cannot be handled with human-constrained possibilities. Satisfying the requirements of low latency video streaming applications requires the streaming workflow to be optimized and streamlined together, which includes media provisioning (capturing, encoding, packaging, an ingesting to the origin server), media delivery (from the origin to the CDN and from the CDN to the end users), media playback (end user video player). The 2nd workshop on Surpassing Latency Limits in Adaptive Live Video Streaming (LIVES 2025) aims to bring together researchers and developers to satisfy the data-intensive processing requirements and QoE challenges of live video streaming applications through leveraging heuristic and learning-based approaches.

Please click here for more information.

The Graph-Massivizer Project is glad to announce that their research expands beyond the European borders. The website receives continuous visits from colleagues from US, Canada, China…and the project team is happy to develop connections with top researchers in #graphprocessing wherever they are.

In such context, Radu Prodan presented Graph-Massivizer Project at the Indian Institute of Science (IISc) in #Bangalore, at the invitation of Prof. Yogesh Simmhan.