Paper title: EVLM: Intent-Driven Edge Vision Language Model for UAV-Based Power Line Inspection

Authors: Reza Farahani (DSG, TU Wien, Austria), Zoha Azimi (Christian Doppler Laboratory ATHENA, ITEC, University of Klagenfurt, Austria), Ilir Murturi (Department of Mechatronics, University of Prishtina, Kosova), Arda Goknil (SINTEF, Oslo, Norway), Sagar Sen (SINTEF, Oslo, Norway), Christian Timmerer (Christian Doppler Laboratory ATHENA, ITEC, University of Klagenfurt, Austria), Schahram Dustdar (DSG, TU Wien, Austria)

 

Conference: 2026 IEEE International Conference on Edge Computing and Communications (IEEE EDGE 2026)

 

Abstract: 

Inspection of critical infrastructure, such as power lines, is increasingly conducted using unmanned aerial vehicles (UAVs) that capture aerial video for subsequent human review. Although recent edge-based approaches deploy onboard object
detectors to identify predefined defect classes, these pipelines remain closed-set, task-specific, and largely decoupled from operator intent and edge resource constraints. This paper introduces EVLM, an intent-driven vision-language framework for onboard UAV-based power line inspection. Given a high-level operator intent, EVLM (i) leverages lightweight histogram-based frame filtering to extract salient key frames under bounded compute budgets, (ii) executes a domain-adapted vision language model (VLM) directly on the UAV for intent-conditioned multimodal reasoning, and (iii) synthesizes structured inspection reports together with a minimal set of evidence frames, replacing continuous raw video transmission with compact semantic outputs. To align the VLM with infrastructure inspection semantics while preserving edge efficiency, we perform parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA), enabling domain specialization without updating the full model parameters. We implement and fully deploy EVLM on an NVIDIA Jetson device representative of UAV-class onboard hardware and evaluate it using 20 publicly released power line inspection video sequences spanning 8 heterogeneous environments and 5 operational intent categories. Experimental results show a data reduction of 94.8 %, with transmitted data decreasing from 485 kB to 25 kB per 4 s segment, corresponding to 72.75 MB versus 3.75 MB over a 10 min inspection mission. EVLM operates feasibly on embedded hardware, maintaining moderate CPU/GPU utilization and bounded power consumption (5.6 W), while producing interpretable, intent-aligned inspection outputs with richer semantic insights than detection-centric baselines.

Quantifying Inter-City Network Latency in Europe: A Measurement based Study for Time-Critical Cloud Services

3rd Workshop on Engineering Techniques for Distributed Computing Continuum Systems (EDCCS), 22-25 June 2026, Seoul, South Korea

Authors: Thomas Schleicher, Kurt Horvath, Dragi Kimovski, Bernd Spiess, Oliver Hohlfeld

Abstract: Time-critical cloud and edge services depend on predictable and low-latency wide-area connectivity, yet inter-city network behavior often deviates from expectations based on geographic distance alone. This paper presents an evaluation framework and results on inter-city network latency across major European metropolitan areas, treating latency as a non-functional property relevant to benchmarking and service placement in cloud computing. We develop a scalable measurement framework based on a distributed probing infrastructure, analyze round-trip latency, and assess spatial efficiency and temporal stability. Initial results reveal unexpectedly high latency on long-distance paths from the Iberian Peninsula toward Turkey. Distance-normalized analysis further exposes pronounced inefficiencies on short-distance paths between Greece and Turkey, suggesting non-distance-related network effects beyond geographic proximity. Temporal analysis shows elevated latency variance and instability on paths involving Turkey, while most other inter-city connections closely follow distance-based expectations and remain stable over time. These findings highlight the importance of distance-normalized and stability-aware metrics for evaluating wide-area cloud connectivity. The presented methodology and results provide practical insight for benchmarking, placement, and operation of latency-sensitive cloud services across geographically distributed infrastructures.

Energy and Compression Efficiency in Large-Scale Video Streaming

IEEE International Conference on Image Processing (ICIP 2026)

[PDF]

Mohammad Ghasempour (AAU, Austria), Hadi Amirpour (AAU, Austria),  and Christian Timmerer (AAU, Austria)

Abstract: The rise in large-scale video streaming has led to increased energy demands across the encoding, transmission, and decoding pipeline. While energy consumption in video streaming has been widely studied, encoding decisions are typically made without explicitly accounting for expected content demand. As a result, the impact of view count on energy consumption and compression efficiency remains largely unexplored. This limits the ability to make informed and efficient encoding decisions in real-world streaming scenarios. In this paper, we propose EcoEncode, an analytical framework to evaluate the impact of view count on codec-level encoding decisions and the resulting trade-offs between energy consumption and compression efficiency. We further show that these decisions depend on video content characteristics and encoding configurations. Based on our findings, we provide practical insights to guide the selection of codecs and presets. Experimental results show that view count is a key factor in codec-level decisions. For low-popularity videos, EcoEncode achieves up to 99% energy savings with only 1-4 VMAF points of quality loss. Across all scenarios, the selected configurations lie on or near the Pareto frontier, and EcoEncode improves quality by up to 14 VMAF points over the least energy-consuming configuration.

Hadi

Title: Complexity prediction of hardware and software video transcoding in the cloud

Authors: Taieb Chachou, Sid Ahmed Fezza, Wassim Hamidouche, Ghalem Belalem, Hadi Amirpour

 

[PDF]

 

Abstract: Today, video content constitutes a significant portion of internet traffic. This video can be viewed by a wide range of devices with varying characteristics and under different network conditions. Video transcoding is a crucial mechanism for adapting video content to this diverse array of devices and bandwidth requirements while ensuring the best possible user experience. However, video transcoding is a computationally intensive process, requiring scalable infrastructure like cloud computing to efficiently handle the complexity and volume of tasks. In this paper, we propose a novel method to predict transcoding time across different types of platforms (CPU and GPU) and codecs (H.264/AVC, H.265/HEVC).

Unlike existing approaches that focus mainly on CPU-based transcoding, the proposed model explicitly considers hardware-accelerated (GPU) transcoding, where accelerators significantly influence video transcoding performance in cloud computing.

The predicted transcoding time can be utilized to optimize the scheduling of transcoding tasks in cloud computing, helping to ensure optimal load balancing and minimize total transcoding time while maintaining the highest video quality. The proposed solution consists of two essential phases: (i) dataset construction and (ii) model construction.  The first phase involves video selection, segmentation, and video transcoding. The second phase focuses on analyzing the most important features that influence the prediction of transcoding time and developing a machine learning-based model for accurate video transcoding time prediction. Experimental results demonstrate that the XGBoost model achieves superior prediction accuracy across both software and hardware codecs, achieving a global coefficient of determination of R²~=~0.993 when evaluated on the complete dataset, which includes video segments transcoded using H.264/AVC and H.265/HEVC codecs on CPU and GPU platforms. This performance represents an improvement of approximately 7.45% compared to state-of-the-art methods.

Hadi

Title: Token-Wise Attention-Guided Semantic Quality Assessment for Compressed Visual Features

Authors: Shien Ke, Changsheng Gao, Hadi Amirpour, Zhihua Wang,  Xaoyan Sun

Event: QoMEX 2026, Cardiff, UK, June 29th – July 3rd, 2026

Abstract: In collaborative and distributed intelligent systems, compressed intermediate features are routinely transmitted and reused, making semantic quality assessment (SQA) crucial for reliable deployment. Recent compressed feature quality assessment (CFQA) benchmarks, however, show that conventional similarity measures often correlate poorly with downstream semantic utility and lack robustness across diverse feature codecs. In this paper, we propose a token-wise, attention-guided method for assessing the semantic quality of compressed features. First, motivated by the observation that many downstream heads normalize and process tokens largely independently, we assess quality at the token level. This token-wise formulation exploits the intrinsic correspondence between the original and reconstructed tokens while reducing cross-token interference. Second, since tokens contribute unequally to downstream task performance, we adopt an attention-guided aggregation scheme: we derive task-adaptive importance weights from DINOv2 self-attention and use them to pool token-wise quality predictions into a global semantic quality score. Third, to accommodate heterogeneous supervision across tasks, we cast CFQA as a regression problem and rescale classification-based rank targets to mitigate label imbalance. Experiments on the CFQA benchmark demonstrate that our method consistently improves PLCC and SROCC across three tasks and four codecs, yielding a practical, codec-agnostic quality interface for next-generation intelligent systems.

Hadi

Title: Quality-Complexity Trade-off for Sustainable Media Delivery

Authors: Hadi Amirpour, Christian Herglotz, Lingfeng Qu, Wei Zhou, Christian Timmerer

Event: QoMEX 2026, Cardiff, UK, June 29th – July 3rd, 2026

Abstract: Sustainable media delivery increasingly requires joint optimization across perceptual quality, bitrate, and computational cost, yet codec comparisons are often reported only in rate-distortion terms without accounting for energy and (en/de)coding time overheads at scale. This paper analyzes quality–rate–computational cost trade-offs using a large-scale dataset. We first quantify the dominant drivers of bitrate, VMAF, and (en/de)coding user time via interpretable regression models, showing that codec and resolution explain a substantial fraction of the observed variance.  We then characterize local sensitivities of bitrate and (en/de)coding user time to incremental increases in VMAF using interpolation in the quality domain and finite-difference derivatives, providing a content-agnostic view of how much additional bitrate and compute, and consequently energy expenditure, is required per unit of quality improvement. To evaluate practical savings, we compute Bjøntegaard Delta metrics relative to a libx264 reference, revealing that large BD-Rate gains can coincide with substantial penalties in (en/de)coding user time, particularly for most recent video coding standards such as Versatile Video Coding (VVC). Finally, we formulate multi-objective configuration selection as a Binary Linear Program (BLP) that selects one operating point per video by trading perceptual quality against bitrate and (en/de)coding user time; across different weight regimes, the selected codec-resolution-frame-rate distributions shift coherently with system priorities.

Title: EPS: Efficient Patch Sampling for Video Overfitting in Deep Super-Resolution Model Training

Authors: Yiying Wei, Hadi Amirpour, Jong Hwan Ko, and Christian Timmerer

Abstract: Leveraging the overfitting property of deep neural networks (DNNs) is trending in video delivery systems to enhance video quality within bandwidth limits. Existing approaches transmit overfitted super-resolution (SR) model streams for low-resolution (LR) bitstreams, which are used to reconstruct high-resolution (HR) videos at the decoder. Although these approaches show promising results, the huge computational costs of training a large number of video frames limit their practical applications. To overcome this challenge, we propose an efficient patch sampling method named EPS for video SR network overfitting, which identifies the most valuable training patches from video frames.

To this end, we first present two low-complexity Discrete Cosine Transform (DCT)-based spatial-temporal features to measure the complexity score of each patch directly. By analyzing the histogram distribution of these features, we then categorize all possible patches into different clusters and select training patches from the cluster with the highest spatial-temporal information. The number of sampled patches is adaptive based on the video content, addressing the trade-off between training complexity and efficiency.

Our method reduces the number of training patches by 75.00\% to 91.69\%, depending on the resolution and number of clusters, while preserving high video quality and greatly improving training efficiency. Our method speeds up patch sampling by up to 82.1$\times$ compared to the state-of-the-art patch sampling technique (EMT).

Hadi

Title: Perception-Inspired Network for Stereo Image Quality Assessment

Authors: Yongli Chang, Guanghui Yue, Bo Zhao, Li Yu, Yakun Ju,  Hadi Amirpour,  Moncef Gabbouj and Wei Zhou.

Abstract: Existing stereo image quality assessment (SIQA) methods generally have limitations in binocular fusion and fine-grained perception modeling. To address these issues, we propose a Perception-Inspired Network for SIQA that simulates binocular difference-guided fusion, high-frequency sensitivity, and hierarchical perception mechanisms of the human visual system (HVS). First, a difference-guided binocular fusion (DGBF) module is designed to mimic the binocular difference sensitivity mechanism, which exploits difference information at both the feature-level and image-level to optimize binocular fusion. Furthermore, the image distortion primarily affects the high-frequency components, which are critical for perceptual quality. To reflect this, we propose a high-frequency enhancement module (HFEM) to simulate the human eye’s sensitivity to edge and texture distortions. Finally, to better achieve fine-grained perception modeling, we propose a hierarchical quality regression strategy that simulates the human perceptual process, from perceiving local details to forming a global quality judgment, thereby achieving a quality prediction more aligned with human subjective evaluation. Experimental results demonstrate that the proposed method outperforms mainstream approaches, achieving a PLCC of 0.9734 on the LIVE I database, and a PLCC of 0.9632 on the LIVE II database.

Title: Dynamic Participatory Game Design with Local AI: From Interviews to Trauma-Aware Interactive Narratives

Authors: Kseniia Harshina, Tom Tucek, Mathias Lux

Location: TextStory 2026 – Delft, The Netherlands, March 2026

Abstract: We present a work-in-progress, trauma-aware participatory storytelling pipeline that uses a locally hosted large language model (LLM) as a neutral chatbot interviewer. The system supports self-paced narration without cloud processing, prioritizing privacy, data sovereignty, and participant control. Interview transcripts are transformed into a structured scene representation (extracted fields and dialogue prompts), which is then replayed through a lightweight prototype interface as an initial step toward interactive memory-based experiences. We report a small formative expert evaluation (n=2) focusing on perceived comfort, emotional safety, and usability. Participants described the interviewer as low-pressure and reflective, while highlighting limitations such as weak acknowledgement of long answers and occasional “forced turns.” We discuss design implications for narrative extraction, turn-taking, and staged evaluation in sensitive contexts, and outline next steps for community-informed studies with participants who have lived experience of displacement.

Title: Lightweight WebAssembly-Based Intrusion Detection for Zero Trust Edge Networks

Authors: Jonathan Weber (TU Wien, Austria), Ilir Murturi (University of Prishtina, Kosova), Xhevahir Bajrami (University of Prishtina, Kosova), Reza Farahani (University of Klagenfurt, Austria), Praveen Kumar Donta (Stockholm University, Sweden), Schahram Dustdar (TU Wien, Austria)

Venue: IEEE Access

Abstract: IoT devices deployed across computing continuum infrastructures present significant security challenges due to resource constraints and decentralization. Traditional centralized intrusion detection systems struggle in such environments because of limited connectivity, high latency, and single points of failure. To address these challenges, this article extends a learning-driven Zero Trust framework tailored to resource-constrained edge environments and proposes an approach for evaluating lightweight intrusion detection models in such environments. Our extended approach enables systematic evaluation of lightweight machine learning models for localized intrusion detection, comprising three layers: (i) compilation, (ii) execution, and (iii) measurement. The proposed approach is implemented using Rust and WebAssembly to ensure portable, efficient, and isolated execution across heterogeneous devices. Using this framework, seven representative intrusion detection models (i.e., Decision Tree (DT), Random Forest (RF), k-Nearest Neighbor (KNN), Logistic Regression (LR), Artificial Neural Network (ANN), and Convolutional Neural Network (CNN) variants) were implemented and evaluated on the UNSW-NB15 dataset. Results show that RF achieved the best trade-off between detection accuracy and efficiency, while simpler models (DT and LR) offered near-instant inference with minimal resource usage, making them ideal for highly constrained devices. In contrast, more complex models such as deep neural networks and KNN introduced significant overhead for only modest accuracy gains. These findings underscore the need to balance accuracy and resource efficiency for effective Zero Trust edge security.