Paper Title: Eye-Tracking, Quality Assessment, and QoE Prediction Models for Point Cloud Videos: Extended Analysis of the ComPEQ-MR Dataset

Link: https://ieeexplore.ieee.org/document/11263821

 

Authors: Shivi Vats (AAU, Austria), Minh Nguyen (Fraunhofer FOKUS, Berlin), Christian Timmerer (AAU, Austria), Hermann Hellwagner (AAU, Austria)

Abstract: 

Point cloud videos, also termed dynamic point clouds (DPCs), have the potential to provide immersive experiences with six degrees of freedom (6DoF). However, there are still several open issues in understanding the Quality of Experience (QoE) and visual attention of end users while experiencing 6DoF volumetric videos. For instance, the quality impact of compressing DPCs, which requires a significant amount of both time and computational resources, needs further investigation. Also, QoE prediction models for DPCs in 6DoF have rarely been developed due to the lack of visual quality databases. Furthermore, visual attention in 6DoF is hardly explored, which impedes research into more sophisticated approaches for adaptive streaming of DPCs. In this paper, we review and analyze in detail the open-source Compressed Point cloud dataset with Eye-tracking and Quality assessment in Mixed Reality (ComPEQ–MR). The dataset, initially presented in [24], comprises 4 uncompressed (raw) DPCs as well as compressed versions processed by Moving Picture Experts Group (MPEG) reference tools (i.e., VPCC and 2 GPCC variants). The dataset includes eye-tracking data of 41 study participants watching the raw DPCs with 6DoF, yielding 164 visual attention maps. We analyze this data and present head and gaze movement results here. The dataset also includes results from subjective tests conducted to assess the quality of the DPCs, each both uncompressed and compressed with 12 levels of distortion, resulting in 2132 quality scores. This work presents the QoE performance results of the compression techniques, the factors with significant impact on participant ratings, and the correlation of the objective Peak Signal-to-Noise Ratio (PSNR) metrics with Mean Opinion Scores (MOS). The results indicate superior performance of the VPCC codec as well as significant variations in quality ratings based on codec choice, bitrate, and quality/distortion level, providing insights for optimizing point cloud video compression in MR applications. Finally, making use of the subjective scores, we trained and evaluated models for QoE prediction for DPCs compressed using the pertinent MPEG tools.We present the models and their prediction results, noting that the fine-tuned ITU-T P.1203 models exhibit good correlation with the subjective ratings. The dataset is available at https://ftp.itec.aau.at/datasets/ComPEQ-MR/.

Predicting Encoding Energy from Low-Pass Anchors for Green Video Streaming

Authors: Zoha Azimi (AAU, Austria), Reza Farahani (AAU, Austria), Vignesh V Menon (Fraunhofer HHI, Berlin), Christian Timmerer (AAU, Austria)

Event: 1st International Workshop on Intelligent and Scalable Systems Across the Computing Continuum (ScaleSys ’25), November 18, 2025, Vienna, Austria, https://scalesys2025.itec.aau.at/

Abstract:  

Video streaming now represents the dominant share of Internet traffic, as ever-higher-resolution content is distributed across a growing range of heterogeneous devices to sustain user Quality of Experience (QoE). However, this trend raises significant concerns about energy efficiency and carbon emissions, requiring methods to provide a trade-off between energy and QoE. This paper proposes a lightweight energy prediction method that estimates the energy consumption of high-resolution video encodings using reference encodings generated at lower resolutions (so-called anchors), eliminating the need for exhaustive per-segment energy measurements, a process that is infeasible at scale. We automatically select encoding parameters, such as resolution and quantization parameter (QP), to achieve substantial energy savings while maintaining perceptual quality, as measured by the Video Multimethod Fusion Assessment (VMAF), within acceptable limits. We implement and evaluate our approach with the open-source VVenC encoder on 100 video sequences from the Inter4K dataset across multiple encoding settings. Results show that, for an average VMAF score reduction of only 1.68, which stays below the Just Noticeable Difference (JND)
threshold, our method achieves 51.22 % encoding energy savings and 53.54 % decoding energy savings compared to a scenario with no quality degradation.

 

Title: Data-Efficient Learning for Generalizable Surgical Video Understanding
Author: Sahar Nasirihaghighi

Venue: Doctoral Consortium MICCAI 2025, 23 – 27 September 2025, Daejeon, South Korea
Abstract: Advances in surgical video analysis are transforming operating rooms into intelligent, data-driven environments. Computer-assisted systems now support the full surgical workflow, from preoperative planning to intraoperative guidance and postoperative assessment. However, developing robust and generalizable models for surgical video understanding remains challenging due to (I) annotation cost and scarcity, (II) spatiotemporal complexity, and (III) domain gap across procedures and institutions. This doctoral research aims to bridge the gap between deep learning–based surgical video analysis in research and its real-world clinical deployment. To address the core challenge of recognizing surgical phases, actions, and events, critical for video-based analysis, I benchmarked state-of-the-art neural network architectures to identify the most effective designs for each task. I further improved performance by proposing novel architectures and integrating advanced modules. Given the high cost of expert annotations and the domain gap across surgical video sources, I focused on reducing reliance on labeled data. We developed semi-supervised frameworks that improve model performance across tasks by leveraging large amounts of unlabeled surgical video. We introduced novel semi-supervised frameworks, including DIST, SemiVTSurge, and ENCORE, that achieved state-of-the-art results on challenging surgical datasets by leveraging minimal labeled data and enhancing model training through dynamic pseudo-labeling. To support reproducibility and advance the field, we released two multi-task datasets: GynSurg, the largest gynecologic laparoscopy dataset, and Cataract-1K, the largest cataract surgery video dataset. Together, this work contributes to robust, data-efficient, and clinically scalable solutions for surgical video analysis, laying the foundation for generalizable AI systems that can meaningfully impact surgical care and training.

Title: GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset
Authors: Sahar Nasirihaghighi, Negin Ghamsarian, Leonie Peschek, Matteo Munari, Heinrich Husslein, Raphael Sznitman, Klaus Schoeffmann
Venue: ACM Multimedia 2025, 27 – 31 October 2025, Dublin, Ireland

Abstract: Recent advances in deep learning have transformed computer-assisted intervention and surgical video analysis, driving improvements not only in surgical training, intraoperative decision support, and patient outcomes, but also in postoperative documentation and surgical discovery. Central to these developments is the availability of large, high-quality annotated datasets. In gynecologic laparoscopy, surgical scene understanding and action recognition are fundamental for building intelligent systems that assist surgeons during operations and provide deeper analysis after surgery. However, existing datasets are often limited by small scale, narrow task focus, or insufficiently detailed annotations, limiting their utility for comprehensive, end-to-end workflow analysis. To address these limitations, we introduce GynSurg, the largest and most diverse multitask dataset for gynecologic laparoscopic surgery to date. GynSurg provides rich annotations across multiple tasks, supporting applications in action recognition, semantic segmentation, surgical documentation, and discovery of novel procedural insights. We demonstrate the dataset’s quality and versatility by benchmarking state-of-the-art models under a standardized training protocol. To accelerate progress in the field, we publicly release the GynSurg dataset and its annotations (https://ftp.itec.aau.at/datasets/GynSurge/).

Title: Agentic Edge Intelligence: A Research Agenda

Authors: Lauri Lovén, Reza Farahani, Ilir Murturi, Stephan Sigg, Schahram Dustdar

Abstract: Agentic AI is rapidly transforming autonomous decision-making, yet its deployment across the edge-cloud continuum remains poorly understood. This paper introduces the concept of agentic edge intelligence, an emerging paradigm in which autonomous agents operate across the computing continuum to negotiate computational resources, data, and services within dynamic digital marketplaces. We position this concept at the intersection of edge intelligence, multi-agent systems, and computational economics, where distributed decision-making replaces centralized orchestration. The paper outlines key research challenges, including scalability, interoperability, market stability, and ethical governance, and proposes a research agenda addressing theoretical, architectural, and societal dimensions. By integrating mechanism design with trustworthy AI and edge computing, the real-time AI economy envisions a self-organizing infrastructure for efficient, transparent, and equitable resource exchange in future digital ecosystems.

Venue: International Workshop on Intelligent Systems and Paradigms for Next Generation Computing Evolution (INSPIRE 2025) in conjunction with the 18th IEEE/ACM Utility and Cloud Computing Conference (UCC)

Title: Serverless Everywhere: A Comparative Analysis of WebAssembly Workflows Across Browser, Edge, and Cloud

Authors: Mario Colosi, Reza Farahani, Lauri Lovén, Radu Prodan, Massimo Villari

Abstract: WebAssembly (Wasm) is a binary instruction format that enables portable, sandboxed, and near-native execution across heterogeneous platforms, making it well-suited for serverless workflow execution on browsers, edge nodes, and cloud servers. However, its performance and stability depend heavily on factors such as startup overhead, runtime execution model (e.g., Ahead-of-Time (AOT) and Just-in-Time (JIT) compilation), and resource variability across deployment contexts. This paper evaluates a Wasm-based serverless workflow executed consistently from the browser to edge and cloud instances. The setup uses wasm32-wasi modules: in the browser, execution occurs within a web worker, while on Edge and Cloud, an HTTP shim streams frames to the Wasm runtime. We measure cold- and warm-start latency, per-step delays, workflow makespan, throughput, and CPU/memory utilization to capture the end-to-end behavior across environments. Results show that AOT compilation and instance warming substantially reduce startup latency. For workflows with small payloads, the browser achieves competitive performance owing to fully in-memory data exchanges. In contrast, as payloads grow, the workflow transitions into a compute- and memory-intensive phase where AOT execution on edge and cloud nodes distinctly surpasses browser performance.

Venue: International Workshop on Intelligent and Scalable Systems across the Computing Continuum (ScaleSys 2025) in conjunction with the 15th International Conference on the Internet of Things (loT 2025)

Title: Toward Sustainability-Aware LLM Inference on Edge Clusters

Authors: Kolichala Rajashekar, Nafiseh Sharghivand, Radu Prodan, Reza Farahani

Abstract: Large language models (LLMs) require substantial computational resources, leading to significant carbon emissions and operational costs. Although training is energy-intensive, the long-term environmental burden arises from inference, amplified by the massive global query volume. Cloud-based inference offers scalability but suffers from latency and bandwidth constraints due to centralized processing and continuous data transfer. Edge clusters instead can mitigate these limitations by enabling localized execution, yet they face trade-offs between performance, energy efficiency, and device constraints. This short paper presents a sustainability-aware LLM inference for edge clusters comprising NVIDIA Jetson Orin NX (8GB) and Nvidia Ada 2000 (16GB) devices. It aims to balance inference latency and carbon footprint through carbon- and latency-aware routing strategies, guided by empirical benchmarking of energy consumption and execution time across diverse prompts and batch (i.e., group of prompts) configurations. We compared baseline greedy strategies to carbon-aware and latency-aware strategies in prompt routing to specific hardware based on benchmarking information. Experimental evaluation shows that a batch size of four prompts achieves a trade-off between throughput, energy efficiency, while larger batches risk GPU memory saturation.

Venue: International Workshop on Intelligent and Scalable Systems across the Computing Continuum (ScaleSys 2025) in conjunction with the 15th International Conference on the Internet of Things (loT 2025)

 

 

Authors: Samira Afzal (Baylor University), Narges Mehran (Salzburg Research Forschungsgesellschaft mbH), Farzad Tashtarian (AAU, Austria), Andrew C. Freeman (Baylor University), Radu Prodan (University of Innsbruck), Christian Timmerer (AAU, Austria)

Venue: IEEE VCIP 2025December 1 – December 4, 2025, Klagenfurt, Austria

Abstract: The environmental impact of video streaming is gaining more attention due to its growing share in global internet traffic and energy consumption. To support accurate and transparent sustainability assessments, we present SEED (Streaming Energy and Emission Dataset)}: an open dataset for estimating energy usage and CO2 emissions in adaptive video streaming. SEED comprises over 500 video segments. It provides segment-level measurements of energy consumption and emissions for two primary stages: provisioning, which encompasses encoding and storage on cloud infrastructure, and end-user consumption, including network interface retrieval, video decoding, and display on end-user devices. The dataset covers multiple codecs (AVC, HEVC), resolutions, bitrates, cloud instance types, and geographic regions, reflecting real-world variations in computing efficiency and regional carbon intensity. By combining empirical benchmarks with component-level energy models, \dataset{} enables detailed analysis and supports the development of energy- and emission-aware adaptive bitrate (ABR) algorithms. The dataset is publicly available at: https://github.com/cd-athena/SEED.

SEED is available at: https://github.com/cd-athena/SEED

NeVES: Real-Time Neural Video Enhancement for HTTP Adaptive Streaming

IEEE VCIP 2025

December 1 – December 4, 2025

Klagenfurt, Austria

[PDF]

Daniele Lorenzi, Farzad Tashtarian, Christian Timmerer

Abstract: Enhancing low-quality video content is a task that has raised particular interest since recent developments in deep learning. Since most of the video content consumed worldwide is delivered over the Internet via HTTP Adaptive Streaming (HAS), implementing these techniques on web browsers would ease the access to visually-enhanced content on user devices.

In this paper, we present NeVES, a multimedia system capable of enhancing the quality of video content streamed through HAS in real time.

The demo is available at: https://github.com/cd-athena/NeVES.

Perceptual Quality Assessment of Spatial Videos on Apple Vision Pro

ACMMM IXR 2025

October 27 – October 31, 2025

Dublin, Ireland

[PDF]

Afshin Gholami, Sara Baldoni, Federica Battisti, Wei Zhou, Christian Timmerer, Hadi Amirpour

Abstract: Immersive stereoscopic/3D video experiences have entered a new era with the advent of smartphones capable of capturing spatial videos, advanced video codecs optimized for multiview content, and Head Mounted Displays (HMD s) that natively support spatial video playback. In this work, we evaluate the quality of spatial videos encoded using optimized x265 software implementations of MV-HEVC on the AVP and compare them with their corresponding 2D versions through a subjective test.

To support this study, we introduce SV-QoE, a novel dataset comprising video clips rendered with a twin-camera setup that replicates the human inter-pupillary distance. Our analysis reveals that spatial videos consistently deliver a superior Quality of Experience ( QoE ) when encoded at identical bitrates, with the benefits becoming more pronounced at higher bitrates. Additionally, renderings at closer distances exhibit significantly enhanced video quality and depth perception, highlighting the impact of spatial proximity on immersive viewing experiences.

We further analyze the impact of disparity on depth perception and examine the correlation between Mean Opinion Score (MOS ) and established objective quality metrics such as PSNR, SSIM, MS-SSIM, VMAF, and AVQT. Additionally, we explore how video quality and depth perception together influence overall quality judgments.