5-19 July, 2024, Niagra Falls, Canada

The first workshop on Surpassing Latency Limits in Adaptive Live Video Streaming (LIVES 2024) aims to bring together researchers and developers to satisfy the data-intensive processing requirements and QoE challenges of live video streaming applications through leveraging heuristic and learning-based approaches.

Delivering video content from a video server to viewers over the Internet is time-consuming in the streaming workflow and has to be handled to offer an uninterrupted streaming experience. The end-to-end latency, i.e., from the camera capture to the user device, particularly problematic for live streaming. Some streaming-based applications, such as virtual events, esports, online learning, gaming, webinars, and all-hands meetings, require low latency for their operation. Video streaming is ubiquitous in many applications, devices, and fields. Delivering high Quality-of-Experience (QoE) to the streaming viewers is crucial, while the requirement to process a large amount of data to satisfy such QoE cannot be handled with human-constrained possibilities. Satisfying the requirements of low latency video streaming applications require the streaming workflow to be optimized and streamlined all together, that includes: media provisioning (capturing, encoding, packaging, an ingesting to the origin server), media delivery (from the origin to the CDN and from the CDN to the end users), media playback (end user video player).

Please click here for more information.

Hosted by SINTEF AS, the project meeting of Graph-Massivizer took place from February 07-09, 2024, in Trysil, Norway.
On February 09, a Joint Workshop of the projects UPCAST, enRichMyData and Graph-Massivizer took place to share knowledge across the projects related to data challenges and approaches, find synergies in technology and data sharing, and identify future collaborations.

The diveXplore video retrieval system, by Klaus Schoeffmann and Sahar Nasirihaghighi, was awarded as the best ‘Video Question-Answering-Tool for Novices’ at the 13th Video Browser Showdown (VBS 2024), which is an international video search challenge annually held at the International Conference on Multimedia Modeling (MMM 2024), which took place this year in Amsterdam, The Netherlands. VBS 2024 was a 6-hours long challenge with many search tasks of different types (known-item search/KIS, ad-hoc video search/AVS, question-answering/QA) in three different datasets, amounting for about 2500 hours of video content, some performed by experts and others by novices recruited from the conference audience.

diveXplore teaser:


diveXplore demo paper:


VBS info:


The 13th Video Browser Showdown (VBS 2024) was held on 29th January, 2024, in Amsterdam, The Netherlands, at the International Conference on Multimedia Modeling (MMM 2024). 12 international teams (from Austria, China, Czech Republic, Germany, Greece, Iceland, Ireland, Italy, Singapore, Switzerland, The Netherlands, Vietnam) competed over about 6 hours for quickly and accurately solving many search tasks of different types (known-item search/KIS, ad-hoc-video search/AVS, question-answering/QA) in three datasets with about 2500 hours of video content. Like in previous years, this large-scale international video retrieval challenge was an exciting event that demonstrated the state-of-the-art performance of interactive video retrieval systems.

On February 1st, 2024, Sahar Nasirihaghighi presented our work on ‘Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers’ at this year’s International Conference on Multimedia Modeling (MMM 2024) in Amsterdam, The Netherlands.

Authors: Sahar Nasirihaghighi, Negin Ghamsarian, Heinrich Husslein, Klaus Schoeffmann

Abstract: Analyzing laparoscopic surgery videos presents a complex and multifaceted challenge, with applications including surgical training, intra-operative surgical complication prediction, and post-operative surgical assessment. Identifying crucial events within these videos is a significant prerequisite in a majority of these applications. In this paper, we introduce a comprehensive dataset tailored for relevant event recognition in laparoscopic gynecology videos. Our dataset includes annotations for critical events associated with major intra-operative challenges and post-operative complications. To validate the precision of our annotations, we assess event recognition performance using several CNN-RNN architectures. Furthermore, we introduce and evaluate a hybrid transformer architecture coupled with a customized training-inference framework to recognize four specific events in laparoscopic surgery videos. Leveraging the Transformer networks, our proposed architecture harnesses inter-frame dependencies to counteract the adverse effects of relevant content occlusion, motion blur, and surgical scene variation, thus significantly enhancing event recognition accuracy. Moreover, we present a frame sampling strategy designed to manage variations in surgical scenes and the surgeons’ skill level, resulting in event recognition with high temporal resolution. We empirically demonstrate the superiority of our proposed methodology in event recognition compared to conventional CNN-RNN architectures through a series of extensive experiments.


An EU funding programme enabling researchers to set up their own interdisciplinary research networks in Europe and beyond. #COSTactions

Representing Ireland with Prof. Horacio González-Vélez of National College of Ireland at the partner meeting of the Cost Action Cerciras – Connecting Education and Research Communities for an Innovative Resource Aware Society in Montpellier today.











Great alignment with several EU skills projects like ARISA – AI Skills, ESSA Software Skills Digital4Business and Digital4Security by facilitating transversal insights.

ACM MMSys 2024, Bari, Italy, Apr. 15-18, 2024 

Authors: Emanuele Artioli (Alpen-Adria-Universität Klagenfurt, Austria), Farzad Tashtarian (Alpen-Adria-Universität Klagenfurt, Austria), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

Abstract: As the popularity of video streaming entertainment continues to grow, understanding how users engage with the content and react to its changes becomes a critical success factor for every stakeholder. User engagement, i.e., the percentage of video the user watches before quitting, is central to customer loyalty, content personalization, ad relevance, and A/B testing. This paper presents DIGITWISE, a digital twin-based approach for modeling adaptive video streaming engagement. Traditional adaptive bitrate (ABR) algorithms assume that all users react similarly to video streaming artifacts and network issues, neglecting individual user sensitivities. DIGITWISE leverages the concept of a digital twin, a digital replica of a physical entity, to model user engagement based on past viewing sessions. The digital twin receives input about streaming events and utilizes supervised machine learning to predict user engagement for a given session. The system model consists of a data processing pipeline, machine learning models acting as digital twins, and a unified model to predict engagement. DIGITWISE employs the XGBoost model in both digital twins and unified models. The proposed architecture demonstrates the importance of personal user sensitivities, reducing user engagement prediction error by up to 5.8% compared to non-user-aware models. Furthermore, DIGITWISE can optimize content provisioning and delivery by identifying the features that maximize engagement, providing an average engagement increase of up to 8.6 %.

Keywords: digital twin, user engagement, xgboost




We’re seeking a passionate researcher for a PhD role in “Efficient Algorithms and Accelerator Architectures for Distributed Edge AI Systems”. This unique position offers the chance to work under the esteemed supervision of Prof. Radu Prodan (AAU Klagenfurt) and Prof. Marcel Baunach (TU Graz), with my guidance at SAL.


What You Will Do:
– Design & implement innovative distributed AI methods and algorithms.
– Customize these methods for the unique constraints of edge devices and networks.
– Investigate novel accelerator architectures for embedded AI applications.
– Explore quantization methods, with a focus on training and fine-tuning on edge devices.
– Publish research in high-impact journals and present at international conferences.

🎓 Candidate Profile:
– Master’s degree in a relevant field.
– Strong in programming and machine learning.
– Excellent communication skills in English.

🌍 Important Residency Note: Applicants should not have resided or carried out main activities in Austria for more than 12 months in the 3 years immediately before the application deadline.

Apply Now! Ensure to follow the specific application process outlined at Crystalline Program Recruitment (link is in the job description). https://lnkd.in/dBCY2xfe

ACM Mile High Video 2024 (mhv), Denver, Colorado, February 11-14, 2024

Authors: Vignesh V Menon, Prajit T Rajendran, Reza Farahani, Klaus Schoffmann, Christian Timmerer

Abstract: The rise in video streaming applications has increased the demand for video quality assessment (VQA). In 2016, Netflix introduced Video Multi-Method Assessment Fusion (VMAF), a full reference VQA metric that strongly correlates with perceptual quality, but its computation is time-intensive. This paper proposes a Discrete Cosine Transform (DCT)-energy-based VQA with texture information fusion (VQ-TIF) model for video streaming applications that determines the visual quality of the reconstructed video compared to the original video. VQ-TIF extracts Structural Similarity (SSIM) and spatiotemporal features of the frames from the original and reconstructed videos and fuses them using a long short-term mem- ory (LSTM)-based model to estimate the visual quality. Experimental results show that VQ-TIF estimates the visual quality with a Pearson Correlation Coefficient (PCC) of 0.96 and a Mean Absolute Error (MAE) of 2.71, on average, compared to the ground truth VMAF scores. Additionally, VQ-TIF estimates the visual quality at a rate of 9.14 times faster than the state-of-the-art VMAF implementation, along with an 89.44 % reduction in energy consumption, assuming an Ultra HD (2160p) display resolution.