Dr. Gerhard Burian and Mag. Vladislav Kashansky participated on behalf of ADAPT collaboration in the international conference: Climate protection: state of play, division of labor, steps forward held at OeNB, Vienna on 07.10.2021.
The first face-to-face DataCloud Meeting took place in Rome, Italy, from October 04-06, 2021. The consortium discussed the architecture and the business cases in preparation for the first project review.
Congratulations to Natalia Sokolova, who got her journal paper on “Automatic detection of pupil reactions in cataract surgery videos” accepted in the PLOS ONE journal. This work has been (co-)authored by Natalia Sokolova, Klaus Schoeffmann, Mario Taschwer, Stephanie Sarny, Doris Putzgruber-Adamitsch, and Yosuf El-Shabrawi.
Congratulations to Negin Ghamsarian et al., who got their paper “ReCal-Net: Joint Region-Channel-Wise Calibrated Network for Semantic Segmentation in Cataract Surgery Videos” accepted at the International Conference on Neural Information Processing (ICONIP 2021).
Abstract: Semantic segmentation in surgical videos is a prerequisite for a broad range of applications towards improving surgical outcomes and surgical video analysis. However, semantic segmentation in surgical videos involves many challenges. In particular, in cataract surgery, various features of the relevant objects such as blunt edges, color and context variation, reflection, transparency, and motion blur pose a challenge for semantic segmentation. In this paper, we propose a novel convolutional module termed as ReCal module, which can calibrate the feature maps by employing region intra-and-inter-dependencies and channel-region cross-dependencies. This calibration strategy can effectively enhance semantic representation by correlating different representations of the same semantic label, considering a multi-angle local view centering around each pixel. Thus the proposed module can deal with distant visual characteristics of unique objects as well as cross-similarities in the visual characteristics of different objects. Moreover, we propose a novel network architecture based on the proposed module termed as ReCal-Net. Experimental results confirm the superiority of ReCal-Net compared to rival state-of-the-art approaches for all relevant objects in cataract surgery. Moreover, ablation studies reveal the effectiveness of the ReCal module in boosting semantic segmentation accuracy.
The Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning paper from ATHENA lab is nominated for the Best New Streaming Innovation Award in the Streaming Media Readers’ Choice Awards 2021.
Voting can be done on the awards’ website. The voting is open until October 4. You can find the paper under the Best New Streaming Innovation Award section as following:
More information about the paper can be found here.
On August 30th 2021, Andreas Leibetseder successfully defended his thesis on “Extracting and Using Medical Expert Knowledge to Advance Video Analysis for Gynecologic Laparoscopy” under the supervision of Prof. Klaus Schöffmann. The defense was chaired by Prof. Hermann Hellwagner and the examiners were Prof. Oge Marques (Florida Atlantic University) and Prof. Mathias Lux (Klagenfurt University). Congratulations to Dr. Leibetseder for this great achievement!
Title: On The Impact of Viewing Distance on Perceived Video Quality
Link: IEEE Visual Communications and Image Processing (VCIP 2021) 5-8 December 2021, Munich, Germany
Authors: Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Raimund Schatz (AIT Austrian Institute of Technology, Austria), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt)
Abstract: Due to the growing importance of optimizing quality and efficiency of video streaming delivery, accurate assessment of user perceived video quality becomes increasingly relevant. However, due to the wide range of viewing distances encountered in real-world viewing settings, actually perceived video quality can vary significantly in everyday viewing situations. In this paper, we investigate and quantify the influence of viewing distance on perceived video quality. A subjective experiment was conducted with full HD sequences at three different stationary viewing distances, with each video sequence being encoded at three different quality levels. Our study results confirm that the viewing distance has a significant influence on the quality assessment. In particular, they show that an increased viewing distance generally leads to an increased perceived video quality, especially at low media encoding quality levels. In this context, we also provide an estimation of potential bitrate savings that knowledge of actual viewing distance would enable in practice.
Since current objective video quality metrics do not systematically take into account viewing distance, we also analyze and quantify the influence of viewing distance on the correlation between objective and subjective metrics. Our results confirm the need for distance-aware objective metrics when accurate prediction of perceived video quality in real-world environments is required.
Title: Improving Per-title Encoding for HTTP Adaptive Streaming by Utilizing Video Super-resolution
Link: IEEE Visual Communications and Image Processing (VCIP 2021) 5-8 December 2021, Munich, Germany
Authors: Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Hannaneh Barahouei Pasandi (Virginia Commonwealth University), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt)
Abstract: In per-title encoding, to optimize a bitrate ladder over spatial resolution, each video segment is downscaled to a set of spatial resolutions and they are all encoded at a given set of bitrates. To find the highest quality resolution for each bitrate, the low-resolution encoded videos are upscaled to the original resolution, and a convex hull is formed based on the scaled qualities. Deep learning-based video super-resolution (VSR) approaches show a significant gain over traditional approaches and they are becoming more and more efficient over time. This paper improves the per-title encoding over the upscaling methods by using deep neural network-based VSR algorithms as they show a significant gain over traditional approaches. Utilizing a VSR algorithm by improving the quality of low-resolution encodings can improve the convex hull. As a result, it will lead to an improved bitrate ladder. To avoid bandwidth wastage at perceptually lossless bitrates a maximum threshold for the quality is set and encodings beyond it are eliminated from the bitrate ladder. Similarly, a minimum threshold is set to avoid low-quality video delivery. The encodings between the maximum and minimum thresholds are selected based on one Just Noticeable Difference. Our experimental results show that the proposed per-title encoding results in a 24% bitrate reduction and 53% storage reduction compared to the state-of-the-art method.
The Quality of Experience (QoE) is well-defined in QUALINET white papers [here, here], but its assessment and metrics are subject to research. The aim of this workshop on “Quality of Immersive Media: Assessment and Metrics” is to provide a forum for researchers and practitioners to discuss the latest findings in this field. The scope of this workshop is (i) to raise awareness about MPEG efforts in the context of quality of immersive visual media and (ii) invite experts (outside of MPEG) to present new techniques relevant to this workshop.
Quality assessments in the context of the MPEG standardization process typically serve two purposes: (1) to foster decision-making on the tool adoptions during the standardization process and (2) to validate the outcome of a standardization effort compared to an established anchor (i.e., for verification testing).
We kindly invite you to the first online MPEG AG 5 Workshop on Quality of Immersive Media: Assessment and Metrics as follows.
Logistics (online):
- Date: October 5, 2021
- Time slot: 1500-1700 UTC
- Zoom registration link: https://iso.zoom.us/meeting/register/tJEpce6sqTgjH9AgH0Q5nINJlyCvlPOLOtzQ
Program/Speakers:
15:00-15:10: Joel Jung & Christian Timmerer (AhG co-chairs): Welcome notice
15:10-15:30: Mathias Wien (AG 5 convenor): MPEG Visual Quality Assessment: Tasks and Perspectives
Abstract: The Advisory Group on MPEG Visual Quality Assessment (ISO/IEC JTC1 SC29/AG5) has been founded in 2020 with the goal to select and design subjective quality evaluation methodologies and objective quality metrics for the assessment of visual coding technologies in the context of the MPEG standardization work. In this talk, the current work items, as well as perspectives and first achievements of the group, are presented.
15:30-15:50: Aljosa Smolic: Perception and Quality of Immersive Media
Abstract: Interest in immersive media increased significantly over recent years. Besides applications in entertainment, culture, health, industry, etc., telepresence and remote collaboration gained importance due to the pandemic and climate crisis. Immersive media have the potential to increase social integration and to reduce greenhouse gas emissions. As a result, technologies along the whole pipeline from capture to display are maturing and applications are becoming available, creating business opportunities. One aspect of immersive technologies that is still relatively undeveloped is the understanding of perception and quality, including subjective and objective assessment. The interactive nature of immersive media poses new challenges to estimation of saliency or visual attention, and to the development of quality metrics. The V-SENSE lab of Trinity College Dublin addresses these questions in current research. This talk will highlight corresponding examples in 360 VR video, light fields, volumetric video and XR.
15:50-16:00: Break/Discussions
16:00-16:20: Jesús Gutiérrez: Quality assessment of immersive media: Recent activities within VQEG
Abstract: This presentation will provide an overview of the recent activities carried out on quality assessment of immersive media within the Video Quality Experts Group (VQEG), particularly within the Immersive Media Group (IMG). Among other efforts, outcomes will be presented from the cross-lab test (carried out by ten different labs) in order to assess and validate subjective evaluation methodologies for 360º videos, which was instrumental in the development of the ITU-T Recommendation P.919. Also, insights will be provided on the current plans on exploring the evaluation of the quality of experience of immersive communication systems, considering different technologies such as 360º video, point cloud, free-viewpoint video, etc.
16:20-16:40: Alexander Raake: <to-be-provided>
16:40-17:00: <to-be-provided>
17:00: Conclusions
Link: IEEE Global Communications Conference 2021
7-11 December 2021 // Madrid, Spain // Hybrid: In-Person and Virtual Conference Connecting Cultures around the Globe
Authors: F. Tashtarian*, R. Falanji‡, A. Bentaleb+, A. Erfanian*, P. S. Mashhadi§,
C. Timmerer*, H. Hellwagner*, R. Zimmermann+
Christian Doppler Laboratory ATHENA, Institute of Information Technology, Alpen-Adria-Universität Klagenfurt, Austria*
Department of Mathematical Science, Sharif University of Technology, Tehran, Iran‡
Department of Computer Science, School of Computing, National University of Singapore (NUS)+
Center for Applied Intelligent Systems Research (CAISR), Halmstad University, Sweden§
Abstract: Recent years have seen tremendous growth in HTTP adaptive live video traffic over the Internet. In the presence of highly dynamic network conditions and diverse request patterns, existing yet simple hand-crafted heuristic approaches for serving client requests at the network edge might incur a large overhead and significant increase in time complexity. Therefore, these approaches might fail in delivering acceptable Quality of Experience (QoE) to end users. To bridge this gap, we propose ROPL, a learning-based client request management solution at the edge that leverages the power of the recent breakthroughs in deep reinforcement learning, to serve requests of concurrent users joining various HTTP-based live video channels. ROPL is able to react quickly to any changes in the environment, performing accurate decisions to serve clients requests, which results in achieving satisfactory user QoE. We validate the efficiency of ROPL through trace-driven simulations and a real-world setup. Experimental results from real-world scenarios confirm that ROPL outperforms existing heuristic-based approaches in terms of QoE, with a factor up to 3.7×.
Index Terms—Network Edge; Request Serving; HTTP Live Streaming; Low Latency; QoE; Deep Reinforcement Learning.