Authors: Hadi Amirpour (Alpen-Adria-Universität Klagenfurt),Ekrem Çetinkaya (Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Bitmovin), and Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK)

Abstract: HTTP Adaptive Streaming (HAS) enables high quality stream-ing of video contents. In HAS, videos are divided into short intervalscalled segments, and each segment is encoded at various quality/bitratesto adapt to the available bandwidth. Multiple encodings of the same con-tent imposes high cost for video content providers. To reduce the time-complexity of encoding multiple representations, state-of-the-art methods typically encode the highest quality representation first and reusethe information gathered during its encoding to accelerate the encodingof the remaining representations. As encoding the highest quality rep-resentation requires the highest time-complexity compared to the lowerquality representations, it would be a bottleneck in parallel encoding scenarios and the overall time-complexity will be limited to the time-complexity of the highest quality representation. In this paper and toaddress this problem, we consider all representations from the highestto the lowest quality representation as a potential, single reference toaccelerate the encoding of the other, dependent representations. We for-mulate a set of encoding modes and assess their performance in terms ofBD-Rate and time-complexity, using both VMAF and PSNR as objec-tive metrics. Experimental results show that encoding a middle qualityrepresentation as a reference, can significantly reduce the maximum en-coding complexity and hence it is an efficient way of encoding multiplerepresentations in parallel. Based on this fact, a fast multirate encodingmethod is proposed which utilizes depth and prediction mode of a middle quality representation to accelerate the encoding of the dependentrepresentations.

The International MultiMedia Modeling Conference (MMM)

25-27 January 2021, Prague, Czech Republic

Link: https://mmm2021.cz

Keywords: HEVC, Video Encoding , Multirate Encoding , DASH

Today, Klaus Schöffmann will present his keynote talk on “Deep Video Understanding and the User” at the ACM Multimedia 2020 Grand Challenge (GC) on “Deep Video Understanding”. The talk will highlight user aspects of automatic video content search, based on deep neural networks, and show several examples where users have serious issues in finding the correct content scene, when video search systems rely too much on the “automatic search” scenario and ignore the user behind. Registered users of ACMMM2020 can watch the talk online; the corresponding GC is scheduled for October 14 from 21:00-22:00 (15:00-16:00 NY Time).

Link: https://2020.acmmm.org/

Authors: Negin Ghamsarian (Alpen-Adria-Universität Klagenfurt), Mario Taschwer (Alpen-Adria-Universität Klagenfurt), Doris Putzgruber-Adamitsch (Klinikum Klagenfurt), Stephanie Sarny (Klinikum Klagenfurt), Klaus Schoeffmann (Alpen-Adria-Universität Klagenfurt)

Abstract: In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos. To optimize the training procedure with the video content, the surgeons require an automatic relevance detection approach. In addition to relevance-based retrieval, these results can be further used for skill assessment and irregularity detection in cataract surgery videos. In this paper, a three-module framework is proposed to detect and classify the relevant phase segments in cataract videos. Taking advantage of an idle frame recognition network, the video is divided into idle and action segments. To boost the performance in relevance detection Mask R-CNN is utilized to detect the cornea in each frame where the relevant surgical actions are conducted. The spatio-temporal localized segments containing higher-resolution information about the pupil texture and actions, and complementary temporal information from the same phase are fed into the relevance detection module. This module consists of four parallel recurrent CNNs being responsible to detect four relevant phases that have been defined with medical experts. The results will then be integrated to classify the action phases as irrelevant or one of four relevant phases. Experimental results reveal that the proposed approach outperforms static CNNs and different configurations of feature-based and end-to-end recurrent networks.

25th International Conference on Pattern Recognition, Milan, Italy

Link: https://www.micc.unifi.it/icpr2020/

Authors: Jesús Aguilar Armijo (Alpen-Adria-Universität Klagenfurt), Babak Taraghi (Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Bitmovin), and Hermann Hellwagner (Alpen-Adria-Universität Klagenfurt)

Abstract: Adaptive video streaming systems typically support different media delivery formats, e.g., MPEG-DASH and HLS, replicating the same content multiple times into the network. Such a diversified system results in inefficient use of storage, caching, and bandwidth resources. The Common Media Application Format (CMAF) emerges to simplify HTTP Adaptive Streaming (HAS), providing a single encoding and packaging
format of segmented media content and offering the opportunities of bandwidth savings, more cache hits and less storage needed. However, CMAF is not yet supported by most devices. To solve this issue, we present a solution where we maintain the main
advantages of CMAF while supporting heterogeneous devices using different media delivery formats. For that purpose, we propose to dynamically convert the content from CMAF to the desired media delivery format at an edge node. We study the bandwidth savings with our proposed approach using an analytical model and simulation, resulting in bandwidth savings of up to 20% with different media delivery format distributions.
We analyze the runtime impact of the required operations on the segmented content performed in two scenarios: the classic one, with four different media delivery formats, and the proposed scenario, using CMAF-only delivery through the network. We
compare both scenarios with different edge compute power assumptions. Finally, we perform experiments in a real video streaming testbed delivering MPEG-DASH using CMAF content to serve a DASH and an HLS client, performing the media conversion for the latter one.

IEEE International Symposium on Multimedia (ISM)

2-4 December 2020, Naples, Italy

https://www.ieee-ism.org/

Keywords: CMAF, Edge Computing, HTTP Adaptive Streaming (HAS)

Abstract: Video accounts for the vast majority of today’s internet traffic and video coding is vital for efficient distribution towards the end-user. Software- or/and cloud-based video coding is becoming more and more attractive, specifically with the plethora of video codecs available right now (e.g., AVC, HEVC, VVC, VP9, AV1, etc.) which is also supported by the latest Bitmovin Video Developer Report 2020. Thus, improvements in video coding enabling efficient adaptive video streaming is a requirement for current and future video services. HTTP Adaptive Streaming (HAS) is now mainstream due to its simplicity, reliability, and standard support (e.g., MPEG-DASH). For HAS, the video is usually encoded in multiple versions (i.e., representations) of different resolutions, bitrates, codecs, etc. and each representation is divided into chunks (i.e., segments) of equal length (e.g., 2-10 sec) to enable dynamic, adaptive switching during streaming based on the user’s context conditions (e.g., network conditions, device characteristics, user preferences). In this context, most scientific papers in the literature target various improvements which are evaluated based on open, standard test sequences. We argue that optimizing video encoding for large scale HAS deployments is the next step in order to improve the Quality of Experience (QoE), while optimizing costs.

Session organizers: Christian Timmerer (Bitmovin, Austria), Mohammad Ghanbari (University of Essex, UK), and Alex Giladi (Comcast, USA).

Picture Coding Symposium (PCS)  at 29 June to 2 July 2021, UK

Link: https://pcs2021.org

DataCloud provides a novel paradigm covering the complete lifecycle of managing Big Data pipelines through discovery, design, simulation, provisioning, deployment, and adaptation across the Computing Continuum. Big Data pipelines in DataCloud interconnect the end-to-end industrial operations of collecting preprocessing and filtering data, transforming and delivering insights, training simulation models, and applying them in the cloud to achieve a business goal. DataCloud delivers a toolbox of new languages, methods, infrastructures, and prototypes for discovering, simulating, deploying, and adapting Big Data pipelines on heterogeneous and untrusted resources. DataCloud separates the design from the run- time aspects of Big Data pipeline deployment, empowering domain experts to take an active part in their definitions. The main exploitation targets the operation and monetization of the toolbox in European markets, and in the Spanish-speaking countries of Latin America. Its aim is to lower the technological entry barriers for the incorporation of Big Data pipelines in organizations’ business processes and make them accessible to a wider set of stakeholders regardless of the hardware infrastructure. DataCloud validates its plan through a strong selection of complementary business cases offered by SMEs and a large company targeting higher mobile business revenues in smart marketing campaigns, reduced production costs of sport events, trustworthy eHealth patient data management, and reduced time to production and better analytics in Industry 4.0 manufacturing. The balanced consortium consists of 11 partners from eight countries. It has three strong university partners specialised in Big Data, distributed computing, and high-productivity languages, led by a research institute. DataCloud gathers six SMEs and one large company (as technology providers and stakeholders/users/early adopters) that prioritise the business focus of the project in achieving high business impacts.

Datacloud is a 36-month duration project submitted to the H2020-ICT-2020-2 call as a Research and Innovation Action (RIA).

Principal investigator at University of Klagenfurt is Univ.-Prof. Dr. Radu Prodan.

Christian Timmerer

Teaser: “Help me, Obi-Wan Kenobi. You’re my only hope,” said the hologram of Princess Leia in Star Wars: Episode IV – A New Hope (1977). This was the first time in cinematic history that the concept of holographic-type communication was illustrated. Almost five decades later, technological advancements are quickly moving this type of communication from science fiction to reality.

Authors: Jeroen van der Hooft (Ghent University), Maria Torres Vega (Ghent University), Tim Wauters (Ghent University), Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Bitmovin), Ali C. Begen (Ozyegin University, Networked Media), Filip De Turck (Ghent University), and Raimund Schatz (AIT Austrian Institute of Technology)

Abstract: Technological improvements are rapidly advancing holographic-type content distribution. Significant research efforts have been made to meet the low-latency and high-bandwidth requirements set forward by interactive applications such as remote surgery and virtual reality. Recent research made six degrees of freedom (6DoF) for immersive media possible, where users may both move their heads and change their position within a scene. In this article, we present the status and challenges of 6DoF applications based on volumetric media, focusing on the key aspects required to deliver such services. Furthermore, we present results from a subjective study to highlight relevant directions for future research.

Link: IEEE Communication Magazine

Authors: Prateek Agrawal (University of Klagenfurt, Austria), Deepak Chaudhary (Lovely Professional University, India), Vishu Madaan (Lovely professional University, India), Anatoliy Zabrovskiy (University of Klagenfurt, Austria), Radu Prodan (University of Klagenfurt, Austria), Dragi Kimovski (University of Klagenfurt, Austria), Christian Timmerer (University of Klagenfurt, Austria)

Abstract: Automated bank cheque verification using image processing is an attempt to complement the present cheque truncation system, as well as to provide an alternate methodology for the processing of bank cheques with minimal human intervention. When it comes to the clearance of the bank cheques and monetary transactions, this should not only be reliable and robust but also save time which is one of the major factor for the countries having large population. Read more

Authors: Ekrem Çetinkaya (Alpen-Adria-Universität Klagenfurt), Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Bitmovin), and Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK)

Abstract: HTTP Adaptive Streaming (HAS) is the most common approach for delivering video content over the Internet. The requirement to encode the same content at different quality levels (i.e., representations) in HAS is a challenging problem for content providers. Fast multirate encoding approaches try to accelerate this process by reusing information from previously encoded representations. In this paper, we use convolutional neural networks (CNNs) to speed up the encoding of multiple representations with a specific focus on parallel encoding. In parallel encoding, the overall time-complexity is limited to the maximum time-complexity of one of the representations that are encoded in parallel. Therefore, instead of reducing the time-complexity for all representations, the highest time-complexities are reduced. Experimental results show that the proposed method achieves significant time-complexity savings in parallel encoding scenarios (41%) with a slight increase in bitrate and quality degradation compared to the HEVC reference software.

Keywords: Video Coding, Convolutional Neural Networks, HEVC, HTTP Adaptive Streaming (HAS)

The FOG just moved from the Lake Wörthersee to ITEC ;)! Lead researchers Dragi Kimovski, and Narges Mehran from Radu Prodan’s Lab and Josef Hammer from Hermann Hellwagner’s Lab setup UNI-KLU’s first FOG infrastructure with 40 computing nodes including 5 GPU-enabled ones.

Why should Cloud have all the FUN xD?