Together with Cathal Gurrin from DCU, Ireland, on June 14, 2024, Klaus Schöffmann gave a keynote talk about “From Concepts to Embeddings. Charting the Use of AI in Digital Video and Lifelog Search Over the Last Decade” at the International Workshop on Multimodal Video Retrieval and Multimodal Language Modelling (MVRMLM’24), co-located with the ACM ICMR 2024 conference in Phuket, Thailand.


Here is the abstract of the talk:

In the past decade, the field of interactive multimedia retrieval has undergone a transformative evolution driven by the advances in artificial intelligence (AI). This keynote talk will explore the journey from early concept-based retrieval systems to the sophisticated embedding-based techniques that dominate the landscape today. By examining the progression of such AI-driven approaches at both the VBS (Video Browser Showdown) and the LSC (Lifelog Search Challenge), we will highlight the pivotal role of comparative benchmarking in accelerating innovation and establishing performance standards. We will also forward at the potential future developments in interactive multimedia retrieval benchmarking, including emerging trends, the integration of multimodal data, and the future comparative benchmarking challenges within our community.


On June 10, 2024, the 7th Lifelog Search Challenge (LSC 2024), an international competition on lifelog retrieval took place as a workshop at the ACM International Conference on Multimedia Retrieval (ICMR 2024) in Phuket, Thailand. The LSC is organized by a large international team (Cathal Gurrin, Björn Þór Jónsson, Duc-Tien Dang-Nguyen, Jakub Lokoc, Klaus Schoeffmann, Minh-Triet Tran, Steve Hodges, Graham Healy, Luca Rossetto, and Werner Bailer) and attracted 21 teams from all around the world (Austria, Czechia, Germany, Iceland, Ireland, Italy, Netherlands, Norway, Portugal, Switzerland, and Vietnam). The competition tests how fast and accurate state-of-the-art lifelog retrieval systems can solve search tasks (known-item search, ad-hoc search, visual question answering) in a shared dataset of about 720000 images, collected by an anonymous  lifelogger over 18 months. With the LIFEXPLORE system developed by Martin Rader, Mario Leopold, and Klaus Schöffmann, ITEC could win this competition for the second time in a row and was awarded for the Best LSC System. Congratulations!

From June 10, 2024 until June 14, 2024, the ACM International Conference on Multimedia Retrieval (ICMR 2024) took place in Phuket, Thailand. It was organized by Cathal Gurrin (DCU), Klaus Schoeffmann (ITEC, AAU), and Rachada Kongkachandra (Thammasat University). ICMR 2024 received 348 paper submissions and about 80 more to the nine co-located workshops (LSC’24, AI-SIPM’24, MORE’24, ICDAR’24, MAD’24, AIQAM’24, MUWS’24, R2B’24, and MVRMLM’24). The conference attracted about 202 on-site participants (including local organizers), with 10 oral sessions, an on-site and a virtual poster session, a demo session, a reproducibility session, two interesting keynotes about Multimodal Retrieval in Computer Vision (Mubarak Shah) and AI-Based Video Analytics (Supavadee Aramvith), a panel about LLM and Multimedia (Alan Smeaton), and four interesting tutorials.


At the PCS 2024 (Picture Coding Symposium), held in Taichung, Taiwan from June 12-14, Hadi Amirpour received the Best Paper Award for the paper “Beyond Curves and Thresholds – Introducing Uncertainty Estimation To Satisfied User Ratios for Compressed Video” written together with Jingwen Zhu, Raimund Schatz, Patrick Le Callet and Christian Timmerer. Congratulations!

To celebrate the 40th birthday of a video game classic, Lukas Lorber from Kleine Zeitung interviewed Felix Schniz about Tetris. The interview touches upon the Cold War history of the video game, the psychology behind the ‘Tetris Effect’, and various annotations by genre expert Felix Schniz about the secret behind the game’s ongoing success.

You can read the full interview here:


Authors: Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria) Ahmed Telili (INSA Rennes, France), Wassim Hamidouche (INSA Rennes, France), Guo Lu (Shanghai Jiao Tong University, China) and Christian Timmerer (AAU, Austria)

Venue: European Signal Processing Conference (EUSIPCO)

Abstract: Content-aware deep neural networks (DNNs) are trending in Internet video delivery. They enhance quality within bandwidth limits by transmitting videos as low-resolution (LR) bitstreams with overfitted super-resolution (SR) model streams to reconstruct high-resolution (HR) video on the decoder end. However, these methods underutilize spatial and temporal redundancy, compromising compression efficiency. In response, our proposed video compression framework introduces spatial-temporal video super-resolution (STVSR), which encodes videos into low spatial-temporal resolution (LSTR) content and a model stream, leveraging the combined spatial and temporal reconstruction capabilities of DNNs. Compared to the state-of-the-art approaches that consider only spatial SR, our approach achieves bitrate savings of 18.71% and 17.04% while maintaining the same PSNR and VMAF, respectively.

Authors: Mohammad Ghasempour (AAU, Austria), Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria),  and Christian Timmerer (AAU, Austria)

Venue: European Signal Processing Conference (EUSIPCO)

Abstract: Video coding relies heavily on reducing spatial and temporal redundancy to enable efficient transmission. To tackle the temporal redundancy, each video frame is predicted from the previously encoded frames, known as reference frames. The quality of this prediction is highly dependent on the quality of the reference frames. Recent advancements in machine learning are motivating the exploration of frame synthesis to generate high-quality reference frames. However, the efficacy of such models depends on training with content similar to that encountered during usage, which is challenging due to the diverse nature of video data. This paper introduces a content-aware reference frame synthesis to enhance inter-prediction efficiency. Unlike conventional approaches that rely on pre-trained models, our proposed framework optimizes a deep learning model for each content by fine-tuning only the last layer of the model, requiring the transmission of only a few kilobytes of additional information to the decoder. Experimental results show that the proposed framework yields significant bitrate savings of 12.76%, outperforming its counterpart in the pre-trained framework, which only achieves 5.13% savings in bitrate.


Authors: Zoha Azimi, Amritha Premkumar, Reza Farahani, Vignesh V Menon, Christian Timmerer, Radu Prodan

Venue: 32nd European Signal Processing Conference (EUSIPCO’24)

Abstract: Traditional per-title encoding approaches aim to maximize perceptual video quality by optimizing resolutions for each bitrate ladder representation. However, ensuring acceptable decoding times in video streaming, especially with the increased runtime complexity of modern codecs like Versatile Video Coding (VVC) compared to predecessors such as High Efficiency Video Coding (HEVC), is essential, as it leads to diminished buffering time, decreased energy consumption, and an improved Quality of Experience (QoE). This paper introduces a decoding complexity-sensitive bitrate ladder estimation scheme designed to optimize adaptive VVC streaming experiences. We design a customized bitrate ladder for the device configuration, ensuring that the

decoding time remains below the threshold to mitigate adverse QoE issues such as rebuffering and to reduce energy consumption. The proposed scheme utilizes an eXtended PSNR (XPSNR)-optimized resolution prediction for each target bitrate, ensuring
the highest possible perceptual quality within the constraints of device resolution and decoding time. Furthermore, it employs XGBoost-based models for predicting XPSNR, QP, and decoding time, utilizing the Inter-4K video dataset for training. The
experimental results indicate that our approach achieves an average 28.39 % reduction in decoding time using the VVC Test Model (VTM). Additionally, it achieves bitrate savings of 3.7 % and 1.84 % to maintain almost the same PSNR and XPSNR,
respectively, for a display resolution constraint of 2160p and a decoding time constraint of 32 s.




The Second Workshop on Serverless, Extreme-Scale, and Sustainable Graph Processing Systems (GraphSys ’24) took place in South Kensington, London, co-located with the 15th ACM/SPEC International Conference on Performance Engineering.

Reza Farahani gave a talk entitled “Serverless Workflow Management Systems on the Computing Continuum”

Authors: Reza Farahani (AAU, Klagenfurt, Austria), Frank Loh (University of Würzburg, Germany), Dumitru Roman (Sintef, Oslo, Norway), Radu Prodan (AAU, Klagenfurt, Austria)

Abstract: The growing desire among application providers for a cost model based on pay-per-use, combined with the need for a seamlessly integrated platform to manage the complex workflows of their applications, has spurred the emergence of a promising computing paradigm known as serverless computing. Although serverless computing was initially considered for cloud environments, it has recently been extended to other layers of the computing continuum, i.e., edge and fog. This extension emphasizes that the proximity of computational resources to data sources can further reduce costs and improve performance and energy efficiency. However, orchestrating the computing continuum in complex application workflows, including a set of serverless functions, introduces new challenges. This paper investigates the opportunities and challenges introduced by serverless computing for workflow management systems (WMS) on the computing continuum. In addition, the paper provides a taxonomy of state-of-the-art WMSs and reviews their capabilities.

Furthermore Reza Farahani and the backend Graph-Massivizer team met to discuss Graph-Massivizer toolkit integration plan.


Dragi Kimovski co-chaired the 7th Workshop on Hot Topics in Cloud Computing Performance (HotCloudPerf 2024) workshop within the International Conference on Performance Engineering (ICPE). During the workshop, he presented a paper titled “Hypergraphs: Facilitating High-Order Modeling of the Computing Continuum.” This event, held at Imperial College London on May 11, 2024, focused on various aspects of cloud computing performance, including elasticity, performance isolation, and dependability.