Paper presented @ IEEE CBMS 2026

Last week, Francesco Marchetto and Klaus Schoeffmann presented their work on synthetic data generation for surgical image synthesis at IEEE CBMS 2026 (Computer-Based Medical Systems) conference in Limassol, Cyprus. Their paper, entitled “Hybrid Semantic Augmentation for Cataract Surgery Image Synthesis with GANs and Diffusion-based Models”, investigated how augmenting semantics in conditional generative models can be used to overcome the critical shortage of annotated training data in surgical AI.

The work introduced a student-teacher augmentation framework in which a trained generative model acts as a teacher to produce synthetic surgical images for a student model. Two augmentation strategies were evaluated: a naive mask re-generation approach that varies image appearance while preserving semantic layout, and a novel Hybrid Anatomy Injection strategy that procedurally generates new semantic masks by compositing surgical instruments onto real anatomical backgrounds. Experiments on the Cataract-1K dataset showed that the proposed semantic augmentation achieves up to 24% improvement in Fréchet Inception Distance over the baseline. By exposing the model to novel instrument-anatomy configurations never seen during training, the semantic augmentation breaks the performance plateau that texture-only variation cannot overcome, enabling the model to continue learning beyond the limits of the original data distribution. For diffusion-based models, which carry strong pretraining biases from large-scale natural image datasets, mask re-generation proves more effective: providing more examples of how surgical scenes look helps these models gradually adapt their pretrained priors to the target domain. Together, these strategies demonstrate that meaningful performance gains in surgical image synthesis can be achieved entirely without collecting new patient data, offering a practical and privacy-friendly path toward more capable generative models in clinical settings.

June 9, 2026

games, Publication

Paper accepted @ SIMULTECH 2026

Title: Can Swarms Be Trusted? Showcasing Swarm Intelligence and Privacy Preservation Through AR

Conference: SIMULTECH 2026, Porto, Portugal, 18.-20.07.2026

Authors: Melanie Schranz, M. Gojkovic, Horia Vulcu, Kseniia Harshina,

Abstract: Swarm intelligence provides a robust approach for decentralized coordination in nowadays systems, yet its algorithmic principles, like local decision-making, role differentiation, and emergent global behavior are often difficult to convey to individuals without prior experience in swarm-based control. This creates practical barriers when deploying swarm-enabled solutions in domains such as shared electric vehicle charging, energy management, or mobility systems, where engineers, operators, and stakeholders must reliably understand how decentralized processes produce system-level outcomes. To address this challenge, we developed an Augmented Reality (AR) game that operationalizes a swarm model inspired by the Artificial Bee Colony algorithm and exposes key algorithmic elements, including information propagation, neighborhood interactions, and collective resource allocation—Swarm AR. The system also illustrates how decentralization can reduce data concentration, which may support privacy advantages under certain assumptions about information flow and system design, without requiring explicit protection mechanisms. A shared electric vehicle charging scenario serves as a use case to demonstrate load balancing and the necessity of distributed coordination. We evaluate the tool through a mixed-method user study using pre/post quantitative measures and qualitative analysis. Results indicate modest improvements in participants’ understanding of swarm coordination logic, decentralized decision processes, and emergent behavior relevant for infrastructure control. These findings suggest that AR-based interactive visualization can serve as an effective technical aid for communicating, validating, and reasoning about the operational characteristics of self-organizing systems, supporting informed engineering design and deployment of decentralized, privacy-aware coordination strategies.

June 8, 2026

Announcement, itec, MMC

Special Session @ VCIP 2026

Title: Advances in Imaging, Perception, and Reasoning for High-Dimensional Visual Data

Conference: VCIP 2026

Abstract: Recent advances in visual sensing, computational imaging, neural representations, and multimodal learning are transforming the way visual data are acquired, processed, communicated, and understood. Modern visual systems increasingly rely on high-dimensional visual data that extend beyond conventional RGB images and videos to include event streams, light fields, hyperspectral and polarization imagery, LiDAR, time-of-flight sensing, neural scene representations, 3D Gaussian splats, and hybrid multimodal sensing modalities. These data capture rich spatial, temporal, geometric, spectral, and cross-modal information, enabling more robust visual processing under challenging conditions such as fast motion, low light, occlusion, missing modalities, and distribution shift. At the same time, the growing complexity and volume of high-dimensional visual data create new challenges in acquisition, restoration, compression, representation, quality assessment, perception, and reasoning. Emerging solutions increasingly integrate imaging, communication, perception, and multimodal intelligence to support reliable visual understanding and decision making. In line with these developments, we invite contributions on computational imaging and novel sensing systems, event-based and multimodal vision, high-dimensional visual restoration and enhancement, learned compression, implicit and neural representations, quality assessment, cross-modal fusion and alignment, robust visual perception, vision-language reasoning, trustworthy AI, and efficient visual communication for next-generation visual systems.

ORGANIZERS

Haowen Bai (Nanyang Technological University, SG)
Rui Zhao (Nanyang Technological University, SG)
Zeyu Xiao (National University of Singapore, SG)
Taewoo Kim (INSAIT, BG)
Hadi Amirpour (University of Klagenfurt, AT)
Tae Hyun Kim (Hanyang University, KR)

June 8, 2026

Publication

Paper Accepted @ Springernature Scientific Reports

Title: An HEVC-based Known-Plaintext Attack for Video Selective Encryption

Authors: Lingfeng Qu, Chen Chen, Jinghan Xu, Yuan Yuan, Ningxiong Mao, Hadi Amirpour

Publication: Springer Nature

June 5, 2026

tutorial

Tutorial accepted @ ACM MM 2026

Title: Perceptual Reliability in Multimedia: Quality Assessment and Anomaly Analysis

Event: ACM MM 2026, Rio de Janeiro, Brazil — 10–14 November 2026.

Presenters: Wei Zhou, Hadi Amirpour, Yang Liu, Patrick Le Callet

June 5, 2026

Publication

Paper accepted @ QoMEX 2026

Title: Asymmetry-Aware No-Reference Video Quality Assessment via Dual-Region Temporal Modeling

Authors: MohammadAli Hamidi, Hadi Amirpour, Christian Timmerer, Luigi Atzori

Abstract: Saliency and semantic-driven asymmetric encoding enable significant bitrate savings while maintaining a comparable viewing experience. This paper presents a No-Reference (NR) Video Quality Assessment (VQA) model for evaluating Asymmetrically Encoded Videos (AEV), addressing challenges such as varying compression levels, scaling artifacts, and asymmetric encoding strategies. The proposed approach combines compression-aware features derived from Quantization Parameters (QPs) with spatio-temporal perceptual descriptors capturing blur, motion, and temporal consistency. A hybrid regression framework based on XGBoost and Ridge regression is employed, where a weighted ensemble improves overall performance. Experimental results conducted on the dataset provided by the QoMEX VQA-AEV Grand Challenge, evaluated under a Leave-One-Source-Out (LOSO) protocol, show that the proposed method outperforms state-of-the-art NR-VQA models in terms of correlation coefficients (Pearson and Spearman) and root mean square error (RMSE).

June 3, 2026

Publication

Paper accepted @ QoMEX 2026

Title: Asymmetry-Aware No-Reference Video Quality Assessment via Dual-Region Temporal Modeling

Authors: Yeganeh Chatri, Hadi Amirpour

Abstract: Modern content-adaptive video encoding increasingly relies on asymmetric compression, where semantically important regions are preserved at higher quality than background areas. This results in spatially and temporally heterogeneous distortion patterns that challenge conventional no-reference video quality assessment (NR-VQA) models, which typically assume spatial homogeneity.

In this work, we propose a lightweight dual-region NR-VQA framework that explicitly models distortion heterogeneity by jointly analyzing global context and a content-focused region using a shared ResNet-18 backbone with temporal mean aggregation. To address limited training data, a two-stage freeze–unfreeze optimization strategy is employed for stable learning.

Experiments on the QoMEX Grand Challenge dataset show that the proposed method achieves an SROCC of 0.881, the highest among the evaluated NR-VQA baselines in our experiments, including NIQE, BRISQUE, DOVER, and Q-Align. Additional evaluations on KoNViD-1k and LIVE-VQC indicate consistent generalization across datasets. These results highlight that explicit modeling of spatial heterogeneity is an effective and practical design principle for NR-VQA under asymmetric compression scenarios.

June 3, 2026

Publication

Paper @ International Journal of Pattern Recognition and Artificial Intelligence

Title: DAP-Adapter: Enhancing Few-Shot CLIP with Dynamically Diverse and Context-Aware Prompt Generation

Authors: Zongjian Li, Hongyou Chen, Lingfeng Qu, Yongjie Zhu, Ya Pan, Baodan Tian, Yong Fan, Hadi Amirpour

Abstract: Contrastive language-image pretraining (CLIP) has demonstrated powerful zero-shot and few-shot classification capabilities by training on large-scale image-text pairs. However, in the CLIP training paradigm, data augmentation strategies are applied primarily to the image inputs, whereas the text prompts remain fixed throughout the training process. Existing approaches typically rely on static text templates or use a limited number of learnable soft prompts with categories, which restricts the expressiveness of the model in capturing category semantics. In this paper, we propose a novel approach called the dynamic attribute prompt adapter (DAP-Adapter), which leverages large language models to generate diverse textual descriptions. Our approach introduces attributes as intermediate bridges that link categories to their specific descriptions. During training, a batch-level dynamic language mode sampling mechanism is adopted in combination with learnable soft prompts to dynamically construct rich text prompts. To further enhance its ability to capture semantics, DAP-Adapter also integrates a nontrainable CLIP adapter. To evaluate the model performance, experiments were conducted on ten datasets. The experimental results demonstrate that the proposed DAP-Adapter outperforms the state-of-the-art Tip-Adapter-F method.

June 3, 2026

Publication

Paper accepted @ QoMEX 2026

Title: QoMEX 2026 Grand Challenge on Video Quality Assessment for Asymmetric Encoded Videos: Methods and Results

Authors: Jingwen Zhu, Hadi Amirpour, Christian Timmerer, et al.

Abstract: This paper presents the results of the Grand Challenge on Video Quality Assessment for Asymmetric Encoded Videos, held at QoMEX 2026 in Cardiff, UK. The challenge addresses the growing need for video quality metrics (VQM) capable of accurately predicting the perceptual quality of asymmetrically encoded videos, where saliency-driven or semantic-based encoding allocates different quality levels to different spatial regions. Participants were provided with the Sport-ROI dataset containing subjective quality scores and were invited to develop both full-reference (FR) and no-reference (NR) VQM models. We describe the challenge design, the dataset, the evaluation methodology, and summarize the submitted approaches and their performance.

June 3, 2026

Announcement, Workshop

EMS 2026 – The 4th Workshop on Emerging Multimedia Systems

EMS 2023 | EMS 2024 | EMS 2025

Call For Submissions

Multimedia has played a significant role in driving Internet usage and has led to a range of technological advancements, such as content delivery networks, compression algorithms, and streaming protocols. With emerging applications, including (but not limited to) augmented, virtual, and extended reality (XR), real-time telepresence, AI-generated content, video analytics, and the usage of AI in multimedia systems in general, multimedia is undergoing a fundamental shift in sharing experiences online and continues to drive the future of the Internet. As these next-generation ultra-low-latency, interactive, and immersive technologies evolve, it is crucial to revisit developed techniques for new formats and representations, not only to enhance performance and interactivity but also to improve energy efficiency and maintain high Quality of Experience (QoE). This workshop will bring together experts from diverse fields, including video streaming research, source video coding, analytics, rate adaptation algorithms, networked systems, immersive media such as 3D and volumetric video streaming, AR/VR applications, as well as energy-efficient systems and QoE optimization, to exchange ideas on identifying challenges and opportunities in designing advanced networked systems for these emerging multimedia technologies. This workshop is a successor of the Emerging Multimedia System (EMS) workshop from ACM SIGCOMM.

Topics of interest in Call-for-Papers

This workshop calls for research on various issues and solutions that can enable live video analytics with the role of edge computing. Topics of interest include (but not limited to) the following:

Networked systems for immersive content capture, streaming, and display
Networked systems for AI-driven video applications
Networked systems for multimedia generative AI
Machine learning for emerging multimedia distribution
Emerging multimedia systems for novel content formats (point clouds, light fields, holography, NeRF, 3DGS, etc.)
Volumetric video delivery on the Internet
Ultra-low-latency networking for multimedia applications
High-throughput transport and distribution for emerging media
Adaptive streaming under network/user constraints for immersive media
Novel content distribution network for AR/VR applications
Management of AR/VR networked systems
Wireless and mobile immersive systems
AR/VR applications in current (5G) and future (6G) wireless networks
Compression and transmission design for 3D content
Edge cloud systems for immersive experiences
Quality of Experience in emerging multimedia systems
Energy efficiency in emerging multimedia systems

Besides typical full research papers that present a complete idea with proper evaluation, we also welcome work-in-progress papers.

Full research papers present new research that has not been previously published. Authors should submit their work describing early/emerging results in a relevant topic area. Full papers are limited to 6 pages, including figures, tables, appendices, and references.
Lightning papers can provide a summary of early, emerging, or ongoing work, as well as short updates of previously published work. Lightning papers will be presented and published in the proceedings. They should not exceed 2 pages, with a maximum of one additional page for references only.