Autohors: Auday Al-Dulaimy, Matthijs Jansen, Bjarne Johansson, Animesh Trivedi, Alexandru Iosup, Mohammad Ashjaei, Antonino Galletta, Dragi Kimovski, Radu Prodan, Konstantinos Tserpes, George Kousiouris, Chris Giannakos, Ivona Brandic, Nawfal Ali, Andre B. Bondi, Alessandro V. Papadopoulos

Journal “Internet of things”: https://link.springer.com/journal/43926

Abstract:

In the era of the IoT revolution, applications are becoming ever more sophisticated and accompanied by diverse functional and non-functional requirements, including those related to computing resources and performance levels. Such requirements make the development and implementation of these applications complex and challenging. Computing models, such as cloud computing, can provide applications with on-demand computation and storage resources to meet their needs. Although cloud computing is a great enabler for IoT and endpoint devices, its limitations make it unsuitable to fulfill all design goals of novel applications and use cases. Instead of only relying on cloud computing, leveraging and integrating resources at different layers (like IoT, edge, and cloud) is necessary to form and utilize a computing continuum.

The layers’ integration in the computing continuum offers a wide range of innovative services, but it introduces new challenges (e.g., monitoring performance and ensuring security) that need to be investigated. A better grasp and more profound understanding of the computing continuum can guide researchers and developers in tackling and overcoming such challenges. Thus, this paper provides a comprehensive and unified view of the computing continuum. The paper discusses computing models in general with a focus on cloud computing, the computing models that emerged beyond the cloud, and the communication technologies that enable computing in the continuum. In addition, two novel reference architectures are presented in this work: one for edge-cloud computing models and the other for edge-cloud communication technologies. We demonstrate real use cases from different application domains (like industry and science) to validate the proposed reference architectures, and we show how these use cases map onto the reference architectures. Finally, the paper highlights key points that express the authors’ vision about efficiently enabling and utilizing the computing continuum in the future.

The review of the DataCloud project (Radu & his team were involved as partners; the project was funded by the EU) took place on 25.06.204 – the final review was a complete success, showcasing the outstanding results achieved.

Together with Cathal Gurrin from DCU, Ireland, on June 14, 2024, Klaus Schöffmann gave a keynote talk about “From Concepts to Embeddings. Charting the Use of AI in Digital Video and Lifelog Search Over the Last Decade” at the International Workshop on Multimodal Video Retrieval and Multimodal Language Modelling (MVRMLM’24), co-located with the ACM ICMR 2024 conference in Phuket, Thailand.

Link: https://mvrmlm2024.ecit.qub.ac.uk

Here is the abstract of the talk:

In the past decade, the field of interactive multimedia retrieval has undergone a transformative evolution driven by the advances in artificial intelligence (AI). This keynote talk will explore the journey from early concept-based retrieval systems to the sophisticated embedding-based techniques that dominate the landscape today. By examining the progression of such AI-driven approaches at both the VBS (Video Browser Showdown) and the LSC (Lifelog Search Challenge), we will highlight the pivotal role of comparative benchmarking in accelerating innovation and establishing performance standards. We will also forward at the potential future developments in interactive multimedia retrieval benchmarking, including emerging trends, the integration of multimodal data, and the future comparative benchmarking challenges within our community.

 

On June 10, 2024, the 7th Lifelog Search Challenge (LSC 2024), an international competition on lifelog retrieval took place as a workshop at the ACM International Conference on Multimedia Retrieval (ICMR 2024) in Phuket, Thailand. The LSC is organized by a large international team (Cathal Gurrin, Björn Þór Jónsson, Duc-Tien Dang-Nguyen, Jakub Lokoc, Klaus Schoeffmann, Minh-Triet Tran, Steve Hodges, Graham Healy, Luca Rossetto, and Werner Bailer) and attracted 21 teams from all around the world (Austria, Czechia, Germany, Iceland, Ireland, Italy, Netherlands, Norway, Portugal, Switzerland, and Vietnam). The competition tests how fast and accurate state-of-the-art lifelog retrieval systems can solve search tasks (known-item search, ad-hoc search, visual question answering) in a shared dataset of about 720000 images, collected by an anonymous  lifelogger over 18 months. With the LIFEXPLORE system developed by Martin Rader, Mario Leopold, and Klaus Schöffmann, ITEC could win this competition for the second time in a row and was awarded for the Best LSC System. Congratulations!

From June 10, 2024 until June 14, 2024, the ACM International Conference on Multimedia Retrieval (ICMR 2024) took place in Phuket, Thailand. It was organized by Cathal Gurrin (DCU), Klaus Schoeffmann (ITEC, AAU), and Rachada Kongkachandra (Thammasat University). ICMR 2024 received 348 paper submissions and about 80 more to the nine co-located workshops (LSC’24, AI-SIPM’24, MORE’24, ICDAR’24, MAD’24, AIQAM’24, MUWS’24, R2B’24, and MVRMLM’24). The conference attracted about 202 on-site participants (including local organizers), with 10 oral sessions, an on-site and a virtual poster session, a demo session, a reproducibility session, two interesting keynotes about Multimodal Retrieval in Computer Vision (Mubarak Shah) and AI-Based Video Analytics (Supavadee Aramvith), a panel about LLM and Multimedia (Alan Smeaton), and four interesting tutorials.

Link: www.icmr2024.org

At the PCS 2024 (Picture Coding Symposium), held in Taichung, Taiwan from June 12-14, Hadi Amirpour received the Best Paper Award for the paper “Beyond Curves and Thresholds – Introducing Uncertainty Estimation To Satisfied User Ratios for Compressed Video” written together with Jingwen Zhu, Raimund Schatz, Patrick Le Callet and Christian Timmerer. Congratulations!

To celebrate the 40th birthday of a video game classic, Lukas Lorber from Kleine Zeitung interviewed Felix Schniz about Tetris. The interview touches upon the Cold War history of the video game, the psychology behind the ‘Tetris Effect’, and various annotations by genre expert Felix Schniz about the secret behind the game’s ongoing success.

You can read the full interview here: https://www.kleinezeitung.at/wirtschaft/gaming/18530006/40-jahre-tetris-aus-dem-kalten-krieg-in-die-unsterblichkeit.

 

Authors: Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria) Ahmed Telili (INSA Rennes, France), Wassim Hamidouche (INSA Rennes, France), Guo Lu (Shanghai Jiao Tong University, China) and Christian Timmerer (AAU, Austria)

Venue: European Signal Processing Conference (EUSIPCO)

Abstract: Content-aware deep neural networks (DNNs) are trending in Internet video delivery. They enhance quality within bandwidth limits by transmitting videos as low-resolution (LR) bitstreams with overfitted super-resolution (SR) model streams to reconstruct high-resolution (HR) video on the decoder end. However, these methods underutilize spatial and temporal redundancy, compromising compression efficiency. In response, our proposed video compression framework introduces spatial-temporal video super-resolution (STVSR), which encodes videos into low spatial-temporal resolution (LSTR) content and a model stream, leveraging the combined spatial and temporal reconstruction capabilities of DNNs. Compared to the state-of-the-art approaches that consider only spatial SR, our approach achieves bitrate savings of 18.71% and 17.04% while maintaining the same PSNR and VMAF, respectively.

Authors: Mohammad Ghasempour (AAU, Austria), Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria),  and Christian Timmerer (AAU, Austria)

Venue: European Signal Processing Conference (EUSIPCO)

Abstract: Video coding relies heavily on reducing spatial and temporal redundancy to enable efficient transmission. To tackle the temporal redundancy, each video frame is predicted from the previously encoded frames, known as reference frames. The quality of this prediction is highly dependent on the quality of the reference frames. Recent advancements in machine learning are motivating the exploration of frame synthesis to generate high-quality reference frames. However, the efficacy of such models depends on training with content similar to that encountered during usage, which is challenging due to the diverse nature of video data. This paper introduces a content-aware reference frame synthesis to enhance inter-prediction efficiency. Unlike conventional approaches that rely on pre-trained models, our proposed framework optimizes a deep learning model for each content by fine-tuning only the last layer of the model, requiring the transmission of only a few kilobytes of additional information to the decoder. Experimental results show that the proposed framework yields significant bitrate savings of 12.76%, outperforming its counterpart in the pre-trained framework, which only achieves 5.13% savings in bitrate.

 

Authors: Zoha Azimi, Amritha Premkumar, Reza Farahani, Vignesh V Menon, Christian Timmerer, Radu Prodan

Venue: 32nd European Signal Processing Conference (EUSIPCO’24)

Abstract: Traditional per-title encoding approaches aim to maximize perceptual video quality by optimizing resolutions for each bitrate ladder representation. However, ensuring acceptable decoding times in video streaming, especially with the increased runtime complexity of modern codecs like Versatile Video Coding (VVC) compared to predecessors such as High Efficiency Video Coding (HEVC), is essential, as it leads to diminished buffering time, decreased energy consumption, and an improved Quality of Experience (QoE). This paper introduces a decoding complexity-sensitive bitrate ladder estimation scheme designed to optimize adaptive VVC streaming experiences. We design a customized bitrate ladder for the device configuration, ensuring that the

decoding time remains below the threshold to mitigate adverse QoE issues such as rebuffering and to reduce energy consumption. The proposed scheme utilizes an eXtended PSNR (XPSNR)-optimized resolution prediction for each target bitrate, ensuring
the highest possible perceptual quality within the constraints of device resolution and decoding time. Furthermore, it employs XGBoost-based models for predicting XPSNR, QP, and decoding time, utilizing the Inter-4K video dataset for training. The
experimental results indicate that our approach achieves an average 28.39 % reduction in decoding time using the VVC Test Model (VTM). Additionally, it achieves bitrate savings of 3.7 % and 1.84 % to maintain almost the same PSNR and XPSNR,
respectively, for a display resolution constraint of 2160p and a decoding time constraint of 32 s.