On 10 June 2026, Dr Felix Schniz hosted a session on the video game Bloodborne for AAU’s Media Club. Following this semester’s Media Club leitmotif of ‘the fantastic,’ Felix delved into the game’s depiction of arcane architecture, dream spaces, and the sublime in virtual realms. With 15 attendees and even guests from Salzburg on campus who came by just for this specific date, the session was a fantastic conclusion to this semester’s Media Club schedule.

Last week, Francesco Marchetto and Klaus Schoeffmann presented their work on synthetic data generation for surgical image synthesis at IEEE CBMS 2026 (Computer-Based Medical Systems) conference in Limassol, Cyprus. Their paper, entitled “Hybrid Semantic Augmentation for Cataract Surgery Image Synthesis with GANs and Diffusion-based Models”, investigated how augmenting semantics in conditional generative models can be used to overcome the critical shortage of annotated training data in surgical AI.

The work introduced a student-teacher augmentation framework in which a trained generative model acts as a teacher to produce synthetic surgical images for a student model. Two augmentation strategies were evaluated: a naive mask re-generation approach that varies image appearance while preserving semantic layout, and a novel Hybrid Anatomy Injection strategy that procedurally generates new semantic masks by compositing surgical instruments onto real anatomical backgrounds. Experiments on the Cataract-1K dataset showed that the proposed semantic augmentation achieves up to 24% improvement in Fréchet Inception Distance over the baseline. By exposing the model to novel instrument-anatomy configurations never seen during training, the semantic augmentation breaks the performance plateau that texture-only variation cannot overcome, enabling the model to continue learning beyond the limits of the original data distribution. For diffusion-based models, which carry strong pretraining biases from large-scale natural image datasets, mask re-generation proves more effective: providing more examples of how surgical scenes look helps these models gradually adapt their pretrained priors to the target domain. Together, these strategies demonstrate that meaningful performance gains in surgical image synthesis can be achieved entirely without collecting new patient data, offering a practical and privacy-friendly path toward more capable generative models in clinical settings.

Title: Can Swarms Be Trusted? Showcasing Swarm Intelligence and Privacy Preservation Through AR 

Conference: SIMULTECH 2026, Porto, Portugal, 18.-20.07.2026

Authors:  Melanie Schranz, M. Gojkovic, Horia Vulcu, Kseniia Harshina, 

Abstract: Swarm intelligence provides a robust approach for decentralized coordination in nowadays systems, yet its algorithmic principles, like local decision-making, role differentiation, and emergent global behavior are often difficult to convey to individuals without prior experience in swarm-based control. This creates practical barriers when deploying swarm-enabled solutions in domains such as shared electric vehicle charging, energy management, or mobility systems, where engineers, operators, and stakeholders must reliably understand how decentralized processes produce system-level outcomes. To address this challenge, we developed an Augmented Reality (AR) game that operationalizes a swarm model inspired by the Artificial Bee Colony algorithm and exposes key algorithmic elements, including information propagation, neighborhood interactions, and collective resource allocation—Swarm AR. The system also illustrates how decentralization can reduce data concentration, which may support privacy advantages under certain assumptions about information flow and system design, without requiring explicit protection mechanisms. A shared electric vehicle charging scenario serves as a use case to demonstrate load balancing and the necessity of distributed coordination. We evaluate the tool through a mixed-method user study using pre/post quantitative measures and qualitative analysis. Results indicate modest improvements in participants’ understanding of swarm coordination logic, decentralized decision processes, and emergent behavior relevant for infrastructure control. These findings suggest that AR-based interactive visualization can serve as an effective technical aid for communicating, validating, and reasoning about the operational characteristics of self-organizing systems, supporting informed engineering design and deployment of decentralized, privacy-aware coordination strategies.

Hadi

Title: Advances in Imaging, Perception, and Reasoning for High-Dimensional Visual Data

Conference: VCIP 2026

Abstract: Recent advances in visual sensing, computational imaging, neural representations, and multimodal learning are transforming the way visual data are acquired, processed, communicated, and understood. Modern visual systems increasingly rely on high-dimensional visual data that extend beyond conventional RGB images and videos to include event streams, light fields, hyperspectral and polarization imagery, LiDAR, time-of-flight sensing, neural scene representations, 3D Gaussian splats, and hybrid multimodal sensing modalities. These data capture rich spatial, temporal, geometric, spectral, and cross-modal information, enabling more robust visual processing under challenging conditions such as fast motion, low light, occlusion, missing modalities, and distribution shift. At the same time, the growing complexity and volume of high-dimensional visual data create new challenges in acquisition, restoration, compression, representation, quality assessment, perception, and reasoning. Emerging solutions increasingly integrate imaging, communication, perception, and multimodal intelligence to support reliable visual understanding and decision making. In line with these developments, we invite contributions on computational imaging and novel sensing systems, event-based and multimodal vision, high-dimensional visual restoration and enhancement, learned compression, implicit and neural representations, quality assessment, cross-modal fusion and alignment, robust visual perception, vision-language reasoning, trustworthy AI, and efficient visual communication for next-generation visual systems.

ORGANIZERS

  • Haowen Bai (Nanyang Technological University, SG)
  • Rui Zhao (Nanyang Technological University, SG)
  • Zeyu Xiao (National University of Singapore, SG)
  • Taewoo Kim (INSAIT, BG)
  • Hadi Amirpour (University of Klagenfurt, AT)
  • Tae Hyun Kim (Hanyang University, KR)
Hadi

Title: An HEVC-based Known-Plaintext Attack for Video Selective Encryption

Authors: Lingfeng Qu, Chen Chen, Jinghan Xu, Yuan Yuan, Ningxiong Mao, Hadi Amirpour

Publication: Springer Nature

Hadi

Title: Perceptual Reliability in Multimedia: Quality Assessment and Anomaly Analysis

Event: ACM MM 2026, Rio de Janeiro, Brazil — 10–14 November 2026.

Presenters: Wei Zhou, Hadi Amirpour, Yang Liu, Patrick Le Callet

Hadi

Title: Asymmetry-Aware No-Reference Video Quality Assessment via Dual-Region Temporal Modeling

Authors: MohammadAli Hamidi, Hadi Amirpour, Christian Timmerer, Luigi Atzori

Abstract: Saliency and semantic-driven asymmetric encoding enable significant bitrate savings while maintaining a comparable viewing experience. This paper presents a No-Reference (NR) Video Quality Assessment (VQA) model for evaluating Asymmetrically Encoded Videos (AEV), addressing challenges such as varying compression levels, scaling artifacts, and asymmetric encoding strategies. The proposed approach combines compression-aware features derived from Quantization Parameters (QPs) with spatio-temporal perceptual descriptors capturing blur, motion, and temporal consistency. A hybrid regression framework based on XGBoost and Ridge regression is employed, where a weighted ensemble improves overall performance. Experimental results conducted on the dataset provided by the QoMEX VQA-AEV Grand Challenge, evaluated under a Leave-One-Source-Out (LOSO) protocol, show that the proposed method outperforms state-of-the-art NR-VQA models in terms of correlation coefficients (Pearson and Spearman) and root mean square error (RMSE).

Hadi

Title: Asymmetry-Aware No-Reference Video Quality Assessment via Dual-Region Temporal Modeling

Authors: Yeganeh Chatri, Hadi Amirpour

Abstract: Modern content-adaptive video encoding increasingly relies on asymmetric compression, where semantically important regions are preserved at higher quality than background areas. This results in spatially and temporally heterogeneous distortion patterns that challenge conventional no-reference video quality assessment (NR-VQA) models, which typically assume spatial homogeneity.

In this work, we propose a lightweight dual-region NR-VQA framework that explicitly models distortion heterogeneity by jointly analyzing global context and a content-focused region using a shared ResNet-18 backbone with temporal mean aggregation. To address limited training data, a two-stage freeze–unfreeze optimization strategy is employed for stable learning.

Experiments on the QoMEX Grand Challenge dataset show that the proposed method achieves an SROCC of 0.881, the highest among the evaluated NR-VQA baselines in our experiments, including NIQE, BRISQUE, DOVER, and Q-Align. Additional evaluations on KoNViD-1k and LIVE-VQC indicate consistent generalization across datasets. These results highlight that explicit modeling of spatial heterogeneity is an effective and practical design principle for NR-VQA under asymmetric compression scenarios.

Hadi

Title: Quality of Multimedia Experience Meets Machine Intelligence

Authors: Wei Zhou, Hadi Amirpour, Tobias Hossfeld

Abstract: Multimedia systems are evolving towards AI-driven, adaptive services, leading to a natural convergence of QoE and machine intelligence. In this context, machine intelligence can empower QoE through learning-based, context-aware, and semantic-driven modelling and optimization. At the same time, QoE can guide machine intelligence by providing a human-centred objective for AI system design and evaluation; see also [11]. Looking beyond human perception, toward agent-centric and hybrid QoE, future multimedia systems increasingly require unified experience objectives that support human-AI co-experience. QoMEX’26 in Cardiff stands as a major milestone highlighting the convergence of Quality of Multimedia Experience with Machine Intelligence. This column reflects on this evolution and outlines the key challenges ahead.

Hadi

Title: DAP-Adapter: Enhancing Few-Shot CLIP with Dynamically Diverse and Context-Aware Prompt Generation

Authors: Zongjian Li, Hongyou Chen, Lingfeng Qu, Yongjie Zhu, Ya Pan, Baodan Tian, Yong Fan, Hadi Amirpour

Abstract: Contrastive language-image pretraining (CLIP) has demonstrated powerful zero-shot and few-shot classification capabilities by training on large-scale image-text pairs. However, in the CLIP training paradigm, data augmentation strategies are applied primarily to the image inputs, whereas the text prompts remain fixed throughout the training process. Existing approaches typically rely on static text templates or use a limited number of learnable soft prompts with categories, which restricts the expressiveness of the model in capturing category semantics. In this paper, we propose a novel approach called the dynamic attribute prompt adapter (DAP-Adapter), which leverages large language models to generate diverse textual descriptions. Our approach introduces attributes as intermediate bridges that link categories to their specific descriptions. During training, a batch-level dynamic language mode sampling mechanism is adopted in combination with learnable soft prompts to dynamically construct rich text prompts. To further enhance its ability to capture semantics, DAP-Adapter also integrates a nontrainable CLIP adapter. To evaluate the model performance, experiments were conducted on ten datasets. The experimental results demonstrate that the proposed DAP-Adapter outperforms the state-of-the-art Tip-Adapter-F method.