Authors: Michael Seufert (University of Augsburg, Germany), Marius Spangenberger (University of Würzburg, Germany), Fabian Poignée (University of Würzburg, Germany), Florian Wamser (Lucerne University of Applied Sciences and Arts, Switzerland), Werner Robitza (AVEQ GmbH, Austria), Christian Timmerer (Christian Doppler-Labor ATHENA, Alpen-Adria-Universität, Austria), Tobias Hoßfeld (University of Würzburg, Germany)

Journal: ACM Transactions on Multimedia Computing Communications and Applications (ACM TOMM)

Abstract: Reaching close-to-optimal bandwidth utilization in Dynamic Adaptive Streaming over HTTP (DASH) systems can, in theory, be achieved with a small discrete set of bit rate representations. This includes typical bit rate ladders used in state-of-the-art DASH systems. In practice, however, we demonstrate that bandwidth utilization, and consequently the Quality of Experience (QoE), can be improved by offering a continuous set of bit rate representations, i.e., a continuous bit rate slide (COBIRAS). Moreover, we find that the buffer fill behavior of different standard adaptive bit rate (ABR) algorithms is sub-optimal in terms of bandwidth utilization. To overcome this issue, we leverage COBIRAS’ flexibility to request segments with any arbitrary bit rate and propose a novel ABR algorithm MinOff, which helps maximizing bandwidth utilization by minimizing download off-phases during streaming. To avoid extensive storage requirements with COBIRAS and to demonstrate the feasibility of our approach, we design and implement a proof-of-concept DASH system for video streaming that relies on just-in-time encoding (JITE), which reduces storage consumption on the DASH server. Finally, we conduct a performance evaluation on our testbed and compare a state-of-the-art DASH system with few bit rate representations and our JITE DASH system, which can offer a continuous bit rate slide, in terms of bandwidth utilization and video QoE for different ABR algorithms.

The review of the DataCloud project (Radu & his team were involved as partners; the project was funded by the EU) took place on 25.06.204 – the final review was a complete success, showcasing the outstanding results achieved.

Together with Cathal Gurrin from DCU, Ireland, on June 14, 2024, Klaus Schöffmann gave a keynote talk about “From Concepts to Embeddings. Charting the Use of AI in Digital Video and Lifelog Search Over the Last Decade” at the International Workshop on Multimodal Video Retrieval and Multimodal Language Modelling (MVRMLM’24), co-located with the ACM ICMR 2024 conference in Phuket, Thailand.

Link: https://mvrmlm2024.ecit.qub.ac.uk

Here is the abstract of the talk:

In the past decade, the field of interactive multimedia retrieval has undergone a transformative evolution driven by the advances in artificial intelligence (AI). This keynote talk will explore the journey from early concept-based retrieval systems to the sophisticated embedding-based techniques that dominate the landscape today. By examining the progression of such AI-driven approaches at both the VBS (Video Browser Showdown) and the LSC (Lifelog Search Challenge), we will highlight the pivotal role of comparative benchmarking in accelerating innovation and establishing performance standards. We will also forward at the potential future developments in interactive multimedia retrieval benchmarking, including emerging trends, the integration of multimodal data, and the future comparative benchmarking challenges within our community.

 

From June 10, 2024 until June 14, 2024, the ACM International Conference on Multimedia Retrieval (ICMR 2024) took place in Phuket, Thailand. It was organized by Cathal Gurrin (DCU), Klaus Schoeffmann (ITEC, AAU), and Rachada Kongkachandra (Thammasat University). ICMR 2024 received 348 paper submissions and about 80 more to the nine co-located workshops (LSC’24, AI-SIPM’24, MORE’24, ICDAR’24, MAD’24, AIQAM’24, MUWS’24, R2B’24, and MVRMLM’24). The conference attracted about 202 on-site participants (including local organizers), with 10 oral sessions, an on-site and a virtual poster session, a demo session, a reproducibility session, two interesting keynotes about Multimodal Retrieval in Computer Vision (Mubarak Shah) and AI-Based Video Analytics (Supavadee Aramvith), a panel about LLM and Multimedia (Alan Smeaton), and four interesting tutorials.

Link: www.icmr2024.org

At the PCS 2024 (Picture Coding Symposium), held in Taichung, Taiwan from June 12-14, Hadi Amirpour received the Best Paper Award for the paper “Beyond Curves and Thresholds – Introducing Uncertainty Estimation To Satisfied User Ratios for Compressed Video” written together with Jingwen Zhu, Raimund Schatz, Patrick Le Callet and Christian Timmerer. Congratulations!

To celebrate the 40th birthday of a video game classic, Lukas Lorber from Kleine Zeitung interviewed Felix Schniz about Tetris. The interview touches upon the Cold War history of the video game, the psychology behind the ‘Tetris Effect’, and various annotations by genre expert Felix Schniz about the secret behind the game’s ongoing success.

You can read the full interview here: https://www.kleinezeitung.at/wirtschaft/gaming/18530006/40-jahre-tetris-aus-dem-kalten-krieg-in-die-unsterblichkeit.

 

Authors: Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria) Ahmed Telili (INSA Rennes, France), Wassim Hamidouche (INSA Rennes, France), Guo Lu (Shanghai Jiao Tong University, China) and Christian Timmerer (AAU, Austria)

Venue: European Signal Processing Conference (EUSIPCO)

Abstract: Content-aware deep neural networks (DNNs) are trending in Internet video delivery. They enhance quality within bandwidth limits by transmitting videos as low-resolution (LR) bitstreams with overfitted super-resolution (SR) model streams to reconstruct high-resolution (HR) video on the decoder end. However, these methods underutilize spatial and temporal redundancy, compromising compression efficiency. In response, our proposed video compression framework introduces spatial-temporal video super-resolution (STVSR), which encodes videos into low spatial-temporal resolution (LSTR) content and a model stream, leveraging the combined spatial and temporal reconstruction capabilities of DNNs. Compared to the state-of-the-art approaches that consider only spatial SR, our approach achieves bitrate savings of 18.71% and 17.04% while maintaining the same PSNR and VMAF, respectively.

Authors: Mohammad Ghasempour (AAU, Austria), Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria),  and Christian Timmerer (AAU, Austria)

Venue: European Signal Processing Conference (EUSIPCO)

Abstract: Video coding relies heavily on reducing spatial and temporal redundancy to enable efficient transmission. To tackle the temporal redundancy, each video frame is predicted from the previously encoded frames, known as reference frames. The quality of this prediction is highly dependent on the quality of the reference frames. Recent advancements in machine learning are motivating the exploration of frame synthesis to generate high-quality reference frames. However, the efficacy of such models depends on training with content similar to that encountered during usage, which is challenging due to the diverse nature of video data. This paper introduces a content-aware reference frame synthesis to enhance inter-prediction efficiency. Unlike conventional approaches that rely on pre-trained models, our proposed framework optimizes a deep learning model for each content by fine-tuning only the last layer of the model, requiring the transmission of only a few kilobytes of additional information to the decoder. Experimental results show that the proposed framework yields significant bitrate savings of 12.76%, outperforming its counterpart in the pre-trained framework, which only achieves 5.13% savings in bitrate.

 

Authors: Zoha Azimi, Amritha Premkumar, Reza Farahani, Vignesh V Menon, Christian Timmerer, Radu Prodan

Venue: 32nd European Signal Processing Conference (EUSIPCO’24)

Abstract: Traditional per-title encoding approaches aim to maximize perceptual video quality by optimizing resolutions for each bitrate ladder representation. However, ensuring acceptable decoding times in video streaming, especially with the increased runtime complexity of modern codecs like Versatile Video Coding (VVC) compared to predecessors such as High Efficiency Video Coding (HEVC), is essential, as it leads to diminished buffering time, decreased energy consumption, and an improved Quality of Experience (QoE). This paper introduces a decoding complexity-sensitive bitrate ladder estimation scheme designed to optimize adaptive VVC streaming experiences. We design a customized bitrate ladder for the device configuration, ensuring that the

decoding time remains below the threshold to mitigate adverse QoE issues such as rebuffering and to reduce energy consumption. The proposed scheme utilizes an eXtended PSNR (XPSNR)-optimized resolution prediction for each target bitrate, ensuring
the highest possible perceptual quality within the constraints of device resolution and decoding time. Furthermore, it employs XGBoost-based models for predicting XPSNR, QP, and decoding time, utilizing the Inter-4K video dataset for training. The
experimental results indicate that our approach achieves an average 28.39 % reduction in decoding time using the VVC Test Model (VTM). Additionally, it achieves bitrate savings of 3.7 % and 1.84 % to maintain almost the same PSNR and XPSNR,
respectively, for a display resolution constraint of 2160p and a decoding time constraint of 32 s.

 

 

 

The Second Workshop on Serverless, Extreme-Scale, and Sustainable Graph Processing Systems (GraphSys ’24) took place in South Kensington, London, co-located with the 15th ACM/SPEC International Conference on Performance Engineering.

Reza Farahani gave a talk entitled “Serverless Workflow Management Systems on the Computing Continuum”

Authors: Reza Farahani (AAU, Klagenfurt, Austria), Frank Loh (University of Würzburg, Germany), Dumitru Roman (Sintef, Oslo, Norway), Radu Prodan (AAU, Klagenfurt, Austria)

Abstract: The growing desire among application providers for a cost model based on pay-per-use, combined with the need for a seamlessly integrated platform to manage the complex workflows of their applications, has spurred the emergence of a promising computing paradigm known as serverless computing. Although serverless computing was initially considered for cloud environments, it has recently been extended to other layers of the computing continuum, i.e., edge and fog. This extension emphasizes that the proximity of computational resources to data sources can further reduce costs and improve performance and energy efficiency. However, orchestrating the computing continuum in complex application workflows, including a set of serverless functions, introduces new challenges. This paper investigates the opportunities and challenges introduced by serverless computing for workflow management systems (WMS) on the computing continuum. In addition, the paper provides a taxonomy of state-of-the-art WMSs and reviews their capabilities.

Furthermore Reza Farahani and the backend Graph-Massivizer team met to discuss Graph-Massivizer toolkit integration plan.