The Christian Doppler (CD) Laboratory ATHENA was established in October 2019 to tackle current and future research and deployment challenges of HTTP Adaptive Streaming (HAS) and emerging streaming methods. The goal of CD laboratories is to conduct application-oriented basic research, promote collaboration between universities and companies, and facilitate technology transfer. They are funded through a public-private partnership between companies and the Christian Doppler Research Association, which is funded by the Ministry for Digital and Economic Affairs and the National Foundation for Research, Technology, and Development (Nationalstiftung für Forschung, Technologie und Entwicklung (FTE)). ATHENA is supported by Bitmovin as a company partner.

The CD laboratories have a duration of seven years and undergo rigorous scientific review after two and five years. This spring, the CD lab ATHENA completed its 5-year evaluation, and we have just received official notification from the CDG that we have successfully passed the review. Consequently, it is time to briefly outline the main achievements during this second phase (i.e., years 2 to 5) of the CD lab ATHENA.

Before exploring the achievements, it’s important to highlight the ongoing relevance of research in video streaming, given its dominance in today’s Internet usage. The January 2024 Sandvine Internet Phenomena report revealed that video streaming accounts for 68% of fixed/wired Internet traffic and 64% for mobile Internet traffic. Specifically, Video on Demand (VoD) represents 54% of fixed/wired and 57% of mobile traffic, while live streaming contributes to 14% of fixed/wired and 7% of mobile traffic. The major services in this domain include YouTube and Netflix, each commanding more than 10% of the overall Internet traffic, with TikTok, Amazon Prime, and Disney+ also playing significant roles.

ATHENA is structured into four work packages, each with distinct objectives as detailed below:

  1. Content provisioning: Primarily involves video encoding for HAS, quality-aware encoding, learning-based encoding, and multi-codec HAS.
  2. Content delivery: Addresses HAS issues by utilizing edge computing, exchanging information between CDN/SDN and clients, providing network assistance for clients, and evaluating corresponding utilities.
  3. Content consumption: Focuses on bitrate adaptation schemes, playback improvements, context and user awareness, and studies on Quality of Experience (QoE).
  4. End-to-end aspects: Offers a comprehensive view of application and transport layer enhancements, Quality of Experience (QoE) models, low-latency HAS, and learning-based HAS.

During the 2nd phase of ATHENA’s work, we achieved significant results, including publications in respected academic journals and conferences. Specifically, our publications were featured in key multimedia, signal processing, computer networks & wireless communication, and computing systems venues, as categorized by Google Scholar under engineering and computer science. Some of the notable publications include IEEE Communications Surveys & Tutorials (impact factor: 35.6), IEEE Transactions on Image Processing (10.6), IEEE Internet of Things Journal (10.6), IEEE Transactions on Circuits and Systems for Video Technology (8.4), and IEEE Transactions on Multimedia (7.3).

Furthermore, we focused on technology transfer by submitting 16 invention disclosures, resulting in 13 patent applications (including provisionals). Collaborating with our company partner, we obtained 6 granted patents. Additionally, we’re pleased to report on the progress of our spin-off projects, as well as the funding secured for two FFG-funded projects named APOLLO and GAIA, and an EU Horizon Europe-funded innovation action called SPIRIT.

The ATHENA team was also active in organizing scientific events such as workshops, special sessions, and special issues at IEEE ICME, ACM MM, ACM MMSys, ACM CoNEXT, IEEE ICIP, PCS, and IEEE Network. We also contributed to reproducibility in research through open source tools (e.g., Video Complexity Analyzer and LLL-CAdViSE) and datasets (e.g., Video Complexity Dataset and Multi-Codec Ultra High Definition 8K MPEG-DASH Dataset) among others.

We also note our contributions to the applications of AI in video coding & streaming, for example in video coding and video streaming as follows:

A major outcome of the second phase is the successful defense of the inaugural cohort of PhD students:

Two postdoctoral scholars have reached a significant milestone on their path toward habilitation

During the second phase, each work package produced excellent publications in their domain, briefly highlighted in the following. Content provisioning (WP-1) focuses mainly on video coding for HAS (43 papers) and immersive media coding for streaming (4 papers). The former can be further subdivided into the following topic areas:

  • Video complexity: spatial and temporal feature extraction (4 papers)
  • Compression efficiency improvement of individual representations (1 paper)
  • Encoding parameter prediction for HAS (9 papers)
  • Efficient bitrate ladder construction (4 papers)
  • Fast multi-rate encoding (3 papers)
  • Data security and data hiding (7 papers)
  • Energy-efficient video encoding for HAS (4 papers)
  • Advancing video quality evaluation (7 papers)
  • Datasets (4 papers)

Content delivery (WP-2) dealt with SDN/CDN assistance for HAS, edge computing support for HAS, and network-embedded media streaming support, resulting in 21 papers. Content consumption (WP-3) worked on QoE enhancement mechanisms at client-side and QoE- and energy-aware content consumption (11 papers). Finally, end-to-end Aspects (WP-4) produced 15 papers in the area of end-to-end QoE improvement in multimedia video streaming. We reported 94 papers published/accepted for the ATHENA 5-year evaluation.

In this context, it is also important to highlight the collaboration within ATHENA, which has resulted in joint publications across various work packages (WPs) and with other ITEC members. For example, collaborations with Prof. Schöffmann (FWF-funded project OVID), FFG-funded projects APOLLO/GAIA, and EU-funded project SPIRIT. In addition, we would like to acknowledge our international collaborators, such as Prof. Hongjie He from Southwest Jiaotong University, Prof. Patrick Le Callet from the University of Nantes, Prof. Wassim Hamidouche from the Technology Innovation Institute (UAE), Dr. Sergey Gorinsky from IMDEA, Dr. Abdelhak Bentaleb from Concordia University, Dr. Raimund Schatz from AIT, and Prof. Pablo Cesar from CWI. We are also pleased to report the successful technology transfers to Bitmovin, particularly CAdViSE (WP-4) and WISH ABR (WP-3). Regular “Fun with ATHENA” meetups and Break-out Groups are utilized for in-depth discussions about innovations and potential technology transfers.

Over the next two years, the ATHENA project will prioritize the development of deep neural network/AI-based image and video coding within the context of HAS. This includes energy- and cost-aware video coding for HAS, immersive video coding such as volumetric video and holography, as well as Quality of Experience (QoE) and energy-aware content consumption for HAS (including energy-efficient, AI-based live video streaming) and generative AI for HAS.

Thanks to all current and former ATHENA team members: Samira Afzal, Hadi Amirpour, Jesús Aguilar Armijo, Emanuele Artioli, Christian Bauer, Alexis Boniface, Ekrem Çetinkaya, Reza Ebrahimi, Alireza Erfanian, Reza Farahani, Mohammad Ghanbari (late), Milad Ghanbari, Mohammad Ghasempour, Selina Zoë Haack, Hermann Hellwagner, Manuel Hoi, Andreas Kogler, Gregor Lammer, Armin Lachini, David Langmeier, Sandro Linder, Daniele Lorenzi, Vignesh V Menon, Minh Nguyen, Engin Orhan, Lingfeng Qu, Jameson Steiner, Nina Stiller, Babak Taraghi, Farzad Tashtarian, Yuan Yuan, and Yiying Wei. Finally, thanks to ITEC support staff Martina Steinbacher, Nina Stiller, Margit Letter, Marion Taschwer, and Rudolf Messner.

We also would like to thank the Christian Doppler Research Association for continuous support, organizing the review, and the reviewer for constructive feedback!

On Friday, July 5 2024, Tom Tuček and Felix Schniz visited the Kwadrat youth centre in Klagenfurt for a workshop on computer role-playing games. Together with a group of highly motivated youngsters (and Kwadrat staff members!), they analysed the opening sequence of the best-selling game Baldur’s Gate 3 together. Afterwards, they introduced the audience to the pen-and-paper roots of modern role-playing games and invited everybody tomjoin a session of the classic Dungeons and Dragons, wonderfully hosted by Tom.

The workshop was well visited and received. Further events to introduce the Klagenfurt youth to the wonders of computer game design are already in the planning.

 

 

Authors: Michael Seufert (University of Augsburg, Germany), Marius Spangenberger (University of Würzburg, Germany), Fabian Poignée (University of Würzburg, Germany), Florian Wamser (Lucerne University of Applied Sciences and Arts, Switzerland), Werner Robitza (AVEQ GmbH, Austria), Christian Timmerer (Christian Doppler-Labor ATHENA, Alpen-Adria-Universität, Austria), Tobias Hoßfeld (University of Würzburg, Germany)

Journal: ACM Transactions on Multimedia Computing Communications and Applications (ACM TOMM)

Abstract: Reaching close-to-optimal bandwidth utilization in Dynamic Adaptive Streaming over HTTP (DASH) systems can, in theory, be achieved with a small discrete set of bit rate representations. This includes typical bit rate ladders used in state-of-the-art DASH systems. In practice, however, we demonstrate that bandwidth utilization, and consequently the Quality of Experience (QoE), can be improved by offering a continuous set of bit rate representations, i.e., a continuous bit rate slide (COBIRAS). Moreover, we find that the buffer fill behavior of different standard adaptive bit rate (ABR) algorithms is sub-optimal in terms of bandwidth utilization. To overcome this issue, we leverage COBIRAS’ flexibility to request segments with any arbitrary bit rate and propose a novel ABR algorithm MinOff, which helps maximizing bandwidth utilization by minimizing download off-phases during streaming. To avoid extensive storage requirements with COBIRAS and to demonstrate the feasibility of our approach, we design and implement a proof-of-concept DASH system for video streaming that relies on just-in-time encoding (JITE), which reduces storage consumption on the DASH server. Finally, we conduct a performance evaluation on our testbed and compare a state-of-the-art DASH system with few bit rate representations and our JITE DASH system, which can offer a continuous bit rate slide, in terms of bandwidth utilization and video QoE for different ABR algorithms.

The review of the DataCloud project (Radu & his team were involved as partners; the project was funded by the EU) took place on 25.06.204 – the final review was a complete success, showcasing the outstanding results achieved.

Together with Cathal Gurrin from DCU, Ireland, on June 14, 2024, Klaus Schöffmann gave a keynote talk about “From Concepts to Embeddings. Charting the Use of AI in Digital Video and Lifelog Search Over the Last Decade” at the International Workshop on Multimodal Video Retrieval and Multimodal Language Modelling (MVRMLM’24), co-located with the ACM ICMR 2024 conference in Phuket, Thailand.

Link: https://mvrmlm2024.ecit.qub.ac.uk

Here is the abstract of the talk:

In the past decade, the field of interactive multimedia retrieval has undergone a transformative evolution driven by the advances in artificial intelligence (AI). This keynote talk will explore the journey from early concept-based retrieval systems to the sophisticated embedding-based techniques that dominate the landscape today. By examining the progression of such AI-driven approaches at both the VBS (Video Browser Showdown) and the LSC (Lifelog Search Challenge), we will highlight the pivotal role of comparative benchmarking in accelerating innovation and establishing performance standards. We will also forward at the potential future developments in interactive multimedia retrieval benchmarking, including emerging trends, the integration of multimodal data, and the future comparative benchmarking challenges within our community.

 

From June 10, 2024 until June 14, 2024, the ACM International Conference on Multimedia Retrieval (ICMR 2024) took place in Phuket, Thailand. It was organized by Cathal Gurrin (DCU), Klaus Schoeffmann (ITEC, AAU), and Rachada Kongkachandra (Thammasat University). ICMR 2024 received 348 paper submissions and about 80 more to the nine co-located workshops (LSC’24, AI-SIPM’24, MORE’24, ICDAR’24, MAD’24, AIQAM’24, MUWS’24, R2B’24, and MVRMLM’24). The conference attracted about 202 on-site participants (including local organizers), with 10 oral sessions, an on-site and a virtual poster session, a demo session, a reproducibility session, two interesting keynotes about Multimodal Retrieval in Computer Vision (Mubarak Shah) and AI-Based Video Analytics (Supavadee Aramvith), a panel about LLM and Multimedia (Alan Smeaton), and four interesting tutorials.

Link: www.icmr2024.org

At the PCS 2024 (Picture Coding Symposium), held in Taichung, Taiwan from June 12-14, Hadi Amirpour received the Best Paper Award for the paper “Beyond Curves and Thresholds – Introducing Uncertainty Estimation To Satisfied User Ratios for Compressed Video” written together with Jingwen Zhu, Raimund Schatz, Patrick Le Callet and Christian Timmerer. Congratulations!

To celebrate the 40th birthday of a video game classic, Lukas Lorber from Kleine Zeitung interviewed Felix Schniz about Tetris. The interview touches upon the Cold War history of the video game, the psychology behind the ‘Tetris Effect’, and various annotations by genre expert Felix Schniz about the secret behind the game’s ongoing success.

You can read the full interview here: https://www.kleinezeitung.at/wirtschaft/gaming/18530006/40-jahre-tetris-aus-dem-kalten-krieg-in-die-unsterblichkeit.

 

Authors: Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria) Ahmed Telili (INSA Rennes, France), Wassim Hamidouche (INSA Rennes, France), Guo Lu (Shanghai Jiao Tong University, China) and Christian Timmerer (AAU, Austria)

Venue: European Signal Processing Conference (EUSIPCO)

Abstract: Content-aware deep neural networks (DNNs) are trending in Internet video delivery. They enhance quality within bandwidth limits by transmitting videos as low-resolution (LR) bitstreams with overfitted super-resolution (SR) model streams to reconstruct high-resolution (HR) video on the decoder end. However, these methods underutilize spatial and temporal redundancy, compromising compression efficiency. In response, our proposed video compression framework introduces spatial-temporal video super-resolution (STVSR), which encodes videos into low spatial-temporal resolution (LSTR) content and a model stream, leveraging the combined spatial and temporal reconstruction capabilities of DNNs. Compared to the state-of-the-art approaches that consider only spatial SR, our approach achieves bitrate savings of 18.71% and 17.04% while maintaining the same PSNR and VMAF, respectively.

Authors: Mohammad Ghasempour (AAU, Austria), Yiying Wei (AAU, Austria), Hadi Amirpour (AAU, Austria),  and Christian Timmerer (AAU, Austria)

Venue: European Signal Processing Conference (EUSIPCO)

Abstract: Video coding relies heavily on reducing spatial and temporal redundancy to enable efficient transmission. To tackle the temporal redundancy, each video frame is predicted from the previously encoded frames, known as reference frames. The quality of this prediction is highly dependent on the quality of the reference frames. Recent advancements in machine learning are motivating the exploration of frame synthesis to generate high-quality reference frames. However, the efficacy of such models depends on training with content similar to that encountered during usage, which is challenging due to the diverse nature of video data. This paper introduces a content-aware reference frame synthesis to enhance inter-prediction efficiency. Unlike conventional approaches that rely on pre-trained models, our proposed framework optimizes a deep learning model for each content by fine-tuning only the last layer of the model, requiring the transmission of only a few kilobytes of additional information to the decoder. Experimental results show that the proposed framework yields significant bitrate savings of 12.76%, outperforming its counterpart in the pre-trained framework, which only achieves 5.13% savings in bitrate.