Adaptive Compressed Domain Video Encryption

Expert Systems With Applications

Mohammad Ghasempour (AAU, Austria), Yuan Yuan (Southwest Jiaotong University), Hadi Amirpour (AAU, Austria), Hongjie He (Southwest Jiaotong University), and Christian Timmerer (AAU, Austria)

Abstract: With the ever-increasing amount of digital video content, efficient encryption is crucial to protect visual content across diverse platforms. Existing methods often incur excessive bitrate overhead due to content variability. Furthermore, since most videos are already compressed, encryption in the compressed domain is essential to avoid processing overhead and re-compression quality loss. However, achieving both format compliance and compression efficiency while ensuring that the decoded content remains unrecognizable is challenging in the compressed domain, since only limited information is available without full decoding. This paper proposes an adaptive compressed domain video encryption (ACDC) method that dynamically adjusts the encryption strategy according to content characteristics. Two tunable parameters derived from the bitstream information enable adaptation to various application requirements. An adaptive syntax integrity method is employed to produce format-compliant bitstreams without full decoding. Experimental results show that ACDC reduces bitrate overhead by 48.2% and achieves a 31-fold speedup in encryption time compared to the latest state of the art, while producing visually unrecognizable outputs.

Hadi

Title:  Indistinguishability Analysis of JPEG Image Encryption Schemes

Authors: Yuan Yuan,  Lingfeng Qu, Ji Zhang, Ningxiong Mao, Hadi Amirpour

Abstract: JPEG images are widely used for communication and storage, making secure encryption essential for privacy protection. Existing JPEG encryption studies primarily rely on empirical metrics such as visual distortion, key space, or correlation, while overlooking the formal indistinguishability against chosen-plaintext attacks (IND-CPA) property. This work provides the first systematic analysis of existing JPEG encryption schemes from the IND-CPA perspective. A new metric, termed feature change rate, is introduced to quantify the preservation of residual features. Furthermore, the relationship between feature change rate and key estimation success under CPA is established, indicating that smaller feature changes result in higher attack accuracy. Based on these findings, we propose a set of design principles for constructing secure and practical JPEG encryption schemes. Finally, we outline a feature-changing encryption strategy that enhances IND-CPA security while maintaining JPEG compatibility and compression efficiency.

QoE Modeling in Volumetric Video Streaming: A Short Survey

IEEE/IFIP Network Operations and Management Symposium (NOMS) 2026

Rome, Italy- 18 – 22 May 2026

[PDF]

Mojtaba Mozhganfar (University of Tehran),  Masoumeh Khodarahmi (IMDEA),  Daniele Lorenzi (Bitmovin),  Mahdi Dolati (Sharif University of Technology), Farzad Tashtarian (Alpen-Adria-Universität Klagenfurt),  Ahmad Khonsari (University of Tehran), Christian Timmerer (Alpen-Adria-Universität Klagenfurt)

Abstract

Volumetric video streaming enables six degrees of freedom (6DoF) interaction, allowing users to navigate freely within immersive 3D environments. Despite notable advancements, volumetric video remains an emerging field, presenting ongoing challenges and vast opportunities in content capture, compression, transmission, decompression, rendering, and display. As user expectations grow, delivering high Quality of Experience (QoE) in these systems becomes increasingly critical due to the complexity of volumetric content and the demands of interactive streaming. This paper reviews recent progress in QoE for volumetric streaming, beginning with an overview of QoE evaluation of volumetric video streaming studies, including subjective assessments tailored to 6DoF content. The core focus of this work is on objective QoE modeling, where we analyze existing models based on their input factors and methodological strategies. Finally, we discuss the key challenges and promising research directions for building perceptually accurate and adaptable QoE models that can support the future of immersive volumetric media.

Resource Management for Distributed Binary Neural Networks in Programmable Data Plane

IEEE/IFIP Network Operations and Management Symposium (NOMS) 2026

Rome, Italy- 18 – 22 May 2026

[PDF]

Fatemeh Babaei (Sharif University of Technology),  Mahdi Dolati (Sharif University of Technology), Mojtaba Mozhganfar (University of Tehran),  Sina Darabi (Università della Svizzera Italiana),  Farzad Tashtarian (University of Klagenfurt)

Abstract

Programmable networks enable the deployment of customized network functions that can process traffic at line rate. The growing traffic volume and the increasing complexity of network management have motivated the use of data-driven and machine learning–based functions within the network. Recent studies demonstrate that machine learning models can be fully executed in the data plane to achieve low latency. However, the limited hardware resources of programmable switches pose a significant challenge for deploying such functions. This work investigates Binary Neural Networks (BNNs) as an effective mechanism for implementing network functions entirely in the data plane. We propose a network-wide resource allocation algorithm that exploits the inherent distributability of neural networks across multiple switches. The algorithm builds on the linear programming relaxation and randomized rounding framework to achieve efficient resource utilization. We implement our approach using Mininet and bmv2 software switches. Comprehensive evaluations on two public datasets show that our method attains near-optimal performance in small-scale networks and consistently outperforms baseline schemes in larger deployments.

YTLive: A Dataset of Real-World YouTube Live Streaming Sessions

IEEE/IFIP Network Operations and Management Symposium (NOMS) 2026

Rome, Italy- 18 – 22 May 2026

[PDF]

Mojtaba Mozhganfar (University of Tehran),  Pooya Jamshidi (University of Tehran), Seyyed Ali Aghamiri (University of Tehran), Mohsen Ghasemi (Sharif University of Technology),  Mahdi Dolati (Sharif University of Technology), Farzad Tashtarian (Alpen-Adria-Universität Klagenfurt),  Ahmad Khonsari (University of Tehran), Christian Timmerer (Alpen-Adria-Universität Klagenfurt)

Abstract

Live streaming plays a major role in today’s digital platforms, supporting entertainment, education, social media, etc. However, research in this field is limited by the lack of large, publicly available datasets that capture real-time viewer behavior at scale. To address this gap, we introduce YTLive, a public dataset focused on YouTube Live. Collected through the YouTube Researcher Program over May and June 2024, YTLive includes more than 507000 records from 12156 live streams, tracking concurrent viewer counts at five-minute intervals along with precise broadcast durations. We describe the dataset design and collection process and present an initial analysis of temporal viewing patterns. Results show that viewer counts are higher and more stable on weekends, especially during afternoon hours. Shorter streams attract larger and more consistent audiences, while longer streams tend to grow slowly and exhibit greater variability. These insights have direct implications for adaptive streaming, resource allocation, and Quality of Experience (QoE) modeling. YTLive offers a timely, open resource to support reproducible research and system-level innovation in live streaming. The dataset is publicly available at: https://github.com/ghalandar/YTLive.
Hadi

Conference: IEEE International Conference on Image Processing 2026

13-17 September 2026

Tampere, Finland

Special Session Proposal 1: Generative Visual Coding: Emerging Paradigms for Future Communication

Special Session Proposal 2: Visual information processing for human-centered immersive experiences

Autor: Hadi Amirpourazarian

Conference: 24th International Conference on e-Society (ES 2026) 

Paper Title: Gamification in the Age of AI: Surveying Player Perceptions of Motivation, Manipulation, and Data Protection

Authors: Kurt Horvath, Tom Tucek

Abstract:  Modern games and applications increasingly employ artificial intelligence (AI), user modeling, and gamification to influence user behavior, sustain engagement, and monetize attention. Although these mechanisms can enhance motivation and enjoyment, they also raise concerns about manipulation, personal data usage, and transparency. To examine how players perceive these dynamics, we conducted a survey with 28 adult participants. Respondents expressed strong concerns about the collection and exploitation of personal information in games and gamified applications, while also reporting low awareness and a limited sense of control over how their data is used. They evaluated progression features such as leveling systems and leaderboards as motivating. In contrast, they rated retention-focused mechanics such as daily login rewards and random reward reveals as highly manipulative. Participants reported that these systems can create fear of missing out or resemble gambling. Participants also expressed support for regulatory measures that limit exploitative design and protect vulnerable users. These findings reveal a clear gap between increasingly powerful engagement strategies and user trust, illustrating the need for more responsible and transparent gamification practices.

2026 IEEE International Conference on Acoustics, Speech, and Signal Processing

4 – 8 May, 2026

Barcelona, Spain

Paper title: Dual-guided Generative Frame Interpolation

Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria) and Christian Timmerer (AAU, Austria)

Abstract: Video frame interpolation (VFI) aims to generate intermediate frames between given keyframes to enhance temporal resolution and visual smoothness. While conventional optical flow–based methods and recent generative approaches achieve promising results, they often struggle with large displacements, failing to maintain temporal coherence and semantic consistency. In this work, we propose dual-guided generative frame interpolation (DGFI), a framework that integrates semantic guidance from vision-language models and flow guidance into a pre-trained diffusion-based image-to-video (I2V) generator. Specifically, DGFI extracts textual descriptions and injects multimodal embeddings to capture high-level semantics, while estimated motion guidance provides smooth transitions. Experiments on public datasets demonstrate the effectiveness of our dual-guided method over the state-of-the-art approaches.

Conference: International Symposium on Biomedical Imaging (ISBI 2026), April 8-11, 2026, London, UK

Paper-Title: SAM-Fed: SAM-Guided Federated Semi-Supervised Learning for Medical Image Segmentation. 

Authors: Sahar Nasirihaghighi, Negin Ghamsarian, Yiping Li, Marcel Breeuwer, Raphael Sznitman, and Klaus Schoeffmann

Abstract:

Medical image segmentation is clinically important, yet data privacy and the cost of expert annotation limit the availability of labeled data. Federated semi-supervised learning (FSSL) offers a solution but faces two challenges: pseudo-label reliability depends on the strength of local models, and client devices often require compact or heterogeneous architectures due to limited computational resources. These constraints reduce the quality and stability of pseudo-labels, while large models, though more accurate, cannot be trained or used for routine inference on client devices. We propose SAM-Fed, a federated semi-supervised framework that leverages a high-capacity segmentation foundation model to guide lightweight clients during training. SAM-Fed combines dual knowledge distillation with an adaptive agreement mechanism to refine pixel-level supervision. Experiments on skin lesion and polyp segmentation across homogeneous and heterogeneous settings show that SAM-Fed consistently outperforms state-of-the-art FSSL methods.

Paper title: ELLMPEG: An Edge-based Agentic LLM Video Processing Tool

Authors: Zoha Azimi, Reza Farahani, Radu Prodan, Christian Timmerer

Venue:  MMSys’26, The 17th ACM Multimedia System Conference, Hong Kong SAR, 4th – 8th April 2026

Abstract:

Large language models (LLMs), the foundation of generative AI systems like ChatGPT, are transforming many fields and applications, including multimedia, enabling more advanced content generation, analysis, and interaction. However, cloud-based LLM deployments face three key limitations: high computational and energy demands, privacy and reliability risks from remote processing, and recurring API costs. Recent advances in agentic AI, especially in structured reasoning and tool use, offer a better way to exploit open and locally deployed tools and LLM models. This paper presents ELLMPEG, an edge-enabled agentic LLM framework for the automated generation of video-processing commands. ELLMPEG integrates tool-aware
Retrieval-Augmented Generation (RAG) with iterative self-reflection to produce and locally verify executable FFmpeg and VVenC commands directly at the edge, eliminating reliance on external cloud APIs. To evaluate ELLMPEG, we collect a dedicated prompt dataset comprising 480 diverse queries covering different categories of FFmpeg and the Versatile Video Codec (VVC) encoder (VVenC) commands. We validate command generation accuracy and evaluate four open-source LLMs based on command validity, tokens generated per second, inference time, and energy efficiency. We also execute the generated commands to assess their runtime correctness and practical applicability. Experimental results show that Qwen2.5, when augmented with the ELLMPEG framework, achieves an average command-generation accuracy of 78 % with zero recurring API cost, outperforming all other open-source models across both the FFmpeg and VVenC datasets.