SDART: Spatial Dart AR Simulation with Hand-Tracked Input

ACM Multimedia 2025

October 27 – October 31, 2025

Dublin, Ireland

[PDF]

Milad Ghanbari (AAU, Austria), Wei Zhou (Cardiff, UK), Cosmin Stejerean (Meta, US), Christian Timmerer (AAU, Austria), Hadi Amirpour (AAU, Austria)

Abstract: We present a physics-driven 3D dart-throwing interaction system for Apple Vision Pro (AVP), developed using Unity 6 engine and running in augmented reality (AR) mode on the device. The system utilizes the PolySpatial and Apple’s ARKit software development kits (SDKs) to ensure hand input and tracking in order to intuitively spawn, grab, and throw virtual darts similar to real darts. The application benefits from physics simulations alongside the innovative no-controller input system of AVP to manipulate objects realistically in an unbounded spatial volume. By implementing spatial distance measurement, scoring logic, and recording user performance, this project enables user studies on quality of experience in interactive experiences. To evaluate the perceived quality and realism of the interaction, we conducted a subjective study with 10 participants using a structured questionnaire. The study measured various aspects of the user experience, including visual and spatial realism, control fidelity, depth perception, immersiveness, and enjoyment. Results indicate high mean opinion scores (MOS) across key dimensions. Link to video: Link

Hadi

VQualA 2025 Challenge on Image Super-Resolution Generated Content Quality Assessment: Methods and Results

ICCV VQualA 2025

October 19 – October 23, 2025

Hawai’i, USA

[PDF]

Hadi Amirpour (AAU, Austria), et al.

Abstract: This paper presents the ISRGC-Q Challenge, built upon the Image Super-Resolution Generated Content Quality Assessment (ISRGen-QA) dataset, and organized as part of the Visual Quality Assessment (VQualA) Competition at the ICCV 2025 Workshops. Unlike existing Super-Resolution Image Quality Assessment (SR-IQA) datasets, ISRGen-QA places greater emphasis on SR images generated by the latest generative approaches, including Generative Adversarial Networks (GANs) and diffusion models. The primary goal of this challenge is to analyze the unique artifacts introduced by modern super-resolution techniques and to evaluate their perceptual quality effectively. A total of 108 participants registered for the challenge, with 4 teams submitting valid solutions and fact sheets for the final testing phase. These submissions demonstrated state-of-the-art (SOTA) performance on the ISRGen-QA dataset. The project is publicly available at: https://github.com/Lighting-YXLI/ISRGen-QA.

VQualA 2025 Challenge on Face Image Quality Assessment: Methods and Results

ICCV VQualA 2025

October 19 – October 23, 2025

Hawai’i, USA

[PDF]

MohammadAli Hamidi (University of Cagliari, Italy), Hadi Amirpour (AAU, Austria), et al.

Abstract: Face images have become integral to various applications. but real-world capture conditions often lead to degradations such as noise, blur, compression artifacts, and poor lighting. These degradations negatively impact image quality and downstream tasks. To promote advancements in face image quality assessment (FIQA), we introduce the VQualA 2025 Challenge on Face Image Quality Assessment, part of ICCV 2025 Workshops. Participants developed efficient models (≤0.5 GFLOPs, ≤5M parameters) predicting Mean Opinion Scores (MOS) under realistic degradations. Submissions were rigorously evaluated using objective metrics and human perceptual judgments. The challenge attracted 127 participants, resulting in 1519 valid final submissions. Detailed methodologies and results are presented, contributing to practical FIQA solutions.

.

A Lightweight Ensemble-Based Face Image Quality Assessment Method with Correlation-Aware Loss

ICCV VQualA 2025

October 19 – October 23, 2025

Hawai’i, USA

 

MohammadAli Hamidi (University of Cagliari, Italy), Hadi Amirpour (AAU, Austria), Luigi Atzori (University of Cagliari, Italy)Christian Timmerer (AAU, Austria),

Abstract:Face image quality assessment (FIQA) plays a critical role in face recognition and verification systems, especially in uncontrolled, real-world environments. Although several methods have been proposed, general-purpose no-reference image quality assessment techniques often fail to capture face-specific degradations. Meanwhile, state-of-the-art FIQA models tend to be computationally intensive, limiting their practical applicability. We propose a lightweight and efficient method for FIQA, designed for the perceptual evaluation of face images in the wild. Our approach integrates an ensemble of two compact convolutional neural networks, MobileNetV3-Small and ShuffleNetV2, with prediction-level fusion via simple averaging. To enhance alignment with human perceptual judgments, we employ a correlation-aware loss (MSECorrLoss), combining mean squared error (MSE) with a Pearson correlation regularizer. Our method achieves a strong balance between accuracy and computational cost, making it suitable for real-world deployment. Experiments on the VQualA FIQA benchmark demonstrate that our model achieves a Spearman rank correlation coefficient (SRCC) of 0.9829 and a Pearson linear correlation coefficient (PLCC) of 0.9894, remaining within competition efficiency constraints.

Hadi

Depth-Enabled Inspection of Medical Videos

ACM Multimedia 2025

October 27 – October 31, 2025

Dublin, Ireland

Hadi Amirpour (AAU, Austria), Doris Putzgruber-Adamitsch (AAU, Austria), Yosuf El-Shabrawi (Kabeg, Austria), Klaus Schoeffmann (AAU, Austria)

Abstract: Cataract surgery is the most frequently performed surgical procedure worldwide, involving the replacement of a patient’s clouded eye lens with a synthetic intraocular lens to restore visual acuity. Although typically brief, the operation consists of distinct phases that demand precision and extensive training, traditionally constrained by the limitations of real-time observation under a microscope. To enhance learning and procedural accuracy, modern advancements in stereoscopic video capture and head-mounted displays (HMDs) offer a promising solution. This paper demonstrates the application of stereoscopic cataract surgery videos, visualized through Apple Vision Pro (AVP) and Meta Quest 3, to provide immersive 3D perspectives that enhance depth perception and spatial awareness. An expert evaluation study with experienced surgeons indicates that stereoscopic visualization significantly improves comprehension of spatial relationships and procedural maneuvers, suggesting its potential to revolutionize surgical education and real-time guidance in ophthalmic surgery. Demo video: Link

Real-Time AI-Driven Avatar Generation for Sign Language in HTTP Adaptive Streaming

The 3rd ACM SIGCOMM Workshop on Emerging Multimedia Systems (ACM EMS 2025)

https://conferences.sigcomm.org/sigcomm/2025/workshop/ems/

8 September 2025 // Coimbra, Portugal

 

Daniele Lorenzi (AAU, Austria), Emanuele Artioli (AAU, Austria), Farzad Tashtarian (AAU, Austria), Christian Timmerer (AAU, Austria)

Abstract: As digital media consumption over the Internet surges globally, ensuring accessibility for all users becomes paramount. For people with hearing impairments, this means providing inclusion beyond classic captioning, which does not convey the full emotional and contextual depth of spoken content. This work addresses this accessibility gap by exploring the use of AI-generated avatars capable of translating speech into sign language in real-time. After defining the multifaceted challenges in this domain, we propose a novel AI-driven task partition to animate avatars for accurate and expressive sign language interpretations in live streaming.

Hadi

Unlocking Implicit Motion for Evaluating Image Complexity
Displays

 

Yixiao Lia (Beihang University, China), Xiaoyuan Yang (Beihang University, China), Yanda Meng (University of Exeter, UK), Hadi Amirpour (AAU, AT), Jiang Liu (Cardiff University, UK), Yuqing Luo (Cardiff University, UK), Hantao Liu (Cardiff University, UK), and Wei Zhou (Cardiff University, UK)

Abstract: Image complexity (IC) plays a critical role in both cognitive science and multimedia computing, influencing visual aesthetics, emotional responses, and tasks such as image classification and enhancement. However, defining and quantifying IC remains challenging due to its multifaceted nature, which encompasses both objective attributes (e.g., detail, structure) and subjective human perception. While traditional methods rely on entropy-based or multidimensional approaches, and recent advances employ machine learning and shallow neural networks, these techniques often fail to fully capture the subjective aspects of IC. Inspired by the fact that the human visual system inherently perceives implicit motion in static images, we propose a novel approach to address this gap by explicitly incorporating hidden motion into IC assessment. We introduce the motion-inspired image complexity assessment metric (MICM) as a new framework for this purpose. MICM introduces a dual-branch architecture: One branch extracts spatial features from static images, while the other generates short video sequences to analyze latent motion dynamics. To ensure meaningful motion representation, we design a hierarchical loss function that aligns video features with text prompts derived from image-to-text models, refining motion semantics at both local (i.e., frame and word) and global levels. Experiments on three public image complexity assessment (ICA) databases demonstrate that our approach, MICM, significantly outperforms state-of-the-art methods, validating its effectiveness. The code will be publicly available upon acceptance of the paper.

 

Hadi

Authors: Ahmed Telili (TII, UAE), Wassim Hamidouche (TII, UAE), Brahim Farhat (TII, UAE), Hadi Amirpour (AAU, Austria), Christian Timmerer (AAU, Austria), Ibrahim Khadraoui (TII, UAE), Jiajie Lu (Politecnico di Milano, Italy), The Van Le (IVCL, South Korea), Jeonneung Baek (IVCL, South Korea), Jin Young Lee (IVCL, South Korea), Yiying Wei (AAU, Austria), Xiaopeng Sun (Meituan Inc. China), Yu Gao (Meituan Inc. China), JianCheng Huang (Meituan Inc. China) and Yujie Zhong (Meituan Inc. China)

Journal: Signal Processing: Image Communication

Abstract: Omnidirectional (360-degree) video is rapidly gaining popularity due to advancements in immersive technologies like virtual reality (VR) and extended reality (XR). However, real-time streaming of such videos, particularly in live mobile scenarios such as unmanned aerial vehicles (UAVs), is hindered by limited bandwidth and strict latency constraints. While traditional methods such as compression and adaptive resolution are helpful, they often compromise video quality and introduce artifacts that diminish the viewer’s experience. Additionally, the unique spherical geometry of 360-degree video, with its wide field of view, presents challenges not encountered in traditional 2D video. To address these challenges, we initiated the 360-degree Video Super Resolution and Quality Enhancement challenge. This competition encourages participants to develop efficient machine learning (ML)-powered solutions to enhance the quality of low-bitrate compressed 360-degree videos, under two tracks focusing on 2× and 4× super-resolution (SR). In this paper, we outline the challenge framework, detailing the two competition tracks and highlighting the SR solutions proposed by the top-performing models. We assess these models within a unified framework, (i) considering quality enhancement, (ii) bitrate gain, and (iii) computational efficiency. Our findings show that lightweight single-frame models can effectively balance visual quality and runtime performance under constrained conditions, setting strong baselines for future research. These insights offer practical guidance for advancing real-time 360-degree video streaming, particularly in bandwidth-limited immersive applications.

 

Authors: Kurt Horvath, Dragi Kimovski, Radu Prodan

Venue: 2025 IEEE International Conference on Edge Computing and Communications (IEEE EDGE 2025), July 7-12 Helsinki, Finland

Abstract: Scheduling services within the computing continuum is complex due to the dynamic interplay of the Edge, Fog, and Cloud resources, each offering distinct computational and networking advantages. This paper introduces SCAREY, a user location-aided service  lifecycle management framework based on state machines. SCAREY addresses critical service discovery, provisioning, placement, and monitoring challenges by providing unified dynamic state machine-based lifecycle management, allowing instances to transition between discoverable and non-discoverable states based on demand. It incorporates a scalable service deployment algorithm to adjust the number of instances and employs network measurements to optimize service placement, ensuring minimal latency and enhancing sustainability. Real-world evaluations demonstrate a 73% improvement in service discovery and acquisition times, 45% cheaper operating costs and over 57% lesser power consumption and lower CO2 emissions compared to existing related methods.


 

 

We are happy to announce that our paper “EnergyLess: An Energy-Aware Serverless Workflow Batch Orchestration on the Computing Continuum” (by Reza Farahani and Radu Prodan) has been accepted for IEEE CLOUD 2025, which will take place in Helsinki, Finland, in July 2025.

Venue: IEEE International Conference on Cloud Computing 2025 (IEEE CLOUD 2025)

Abstract: Serverless cloud computing is increasingly adopted for workflow management, optimizing resource utilization for providers while lowering costs for customers. Integrating edge computing into this paradigm enhances scalability and efficiency, enabling seamless workflow distribution across geographically dispersed resources on the computing continuum. However, existing serverless workflow orchestration methods on the computing continuum often prioritize time and cost objectives, neglecting energy consumption and carbon footprint. This paper introduces EnergyLess, a multi-objective concurrent serverless workflow batch orchestration service for the computing continuum. EnergyLess decomposes workflow functions within a batch into finer-grained sub-functions and schedules either the original or sub-function versions to appropriate regions and instances on the continuum, improving energy consumption, carbon footprint, economic cost, and completion time while considering individual workflow requirements and resource constraints. We formulate the problem as a mixed-integer nonlinear programming (MINLP) model and propose three lightweight heuristic algorithms for function decomposition and scheduling. Evaluations on a large- scale computing continuum testbed with realistic workflows, spanning AWS Lambda, Google Cloud Functions (GCF), and 325 fog and edge instances across six regions demonstrate that EnergyLess improves cost efficiency by 75 %, completion time by 6%, energy consumption by 15%, and CO2 emissions by 20% for a batch size of 300, compared to three baseline methods.