Medical Multimedia Information Systems

Sabrina Kletz presented her work at the MIAR Workshop @ MICCAI 2019


Sabrina Kletz presented the paper “Learning the Representation of Instrument Images in Laparoscopy Video” at the MIAR Workshop @ MICCAI 2019 in Shenzhen, China.

Authors: Sabrina Kletz, Klaus Schoeffmann, Heinrich Husslein

Abstract: Automatic recognition of instruments in laparoscopy videos poses many challenges that need to be addressed like identifying multiple instruments appearing in various representations and in different lighting conditions which in turn may be occluded by other instruments, tissue, blood or smoke. Considering these challenges it may be beneficial for recognition approaches that instrument frames are first detected in a sequence of video frames for further investigating only these frames. This pre-recognition step is also relevant for many other classification tasks in laparoscopy videos such as action recognition or adverse event analysis. In this work, we address the task of binary classification to recognize video frames as either instrument or non-instrument images. We examine convolutional neural network models to learn the representation of instrument frames in videos and take a closer look at learned activation patterns. For this task, GoogLeNet together with batch normalization is trained and validated using a publicly available dataset for instrument count classifications. We compare transfer learning with learning from scratch and evaluate on datasets from cholecystectomy and gynecology. The evaluation shows that fine-tuning a pre-trained model on the instrument and non-instrument images is much faster and more stable in learning than training a model from scratch.

Conference: 2019 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), October 13–17, 2018, Shenzhen, China

Track: Medical Imaging and Augmented Reality (MIAR) Workshop @MICCAI

Natalia Sokolova

MMM’20: Evaluating the Generalization Performance of Instrument Classification in Cataract Surgery Videos


Our paper has been accepted for publication at the MMM 2020 Conference on Multimedia Modeling. The work was conducted in the context of the ongoing OVID project.

Authors: Natalia Sokolova, Klaus Schoeffmann, Mario Taschwer (AAU Klagenfurt); Doris
Putzgruber-Adamitsch, Yosuf El-Shabrawi (Klinikum Klagenfurt)

In the field of ophthalmic surgery, many clinicians nowadays record their microscopic procedures with a video camera and use the recorded footage for later purpose, such as forensics, teaching, or training. However, in order to efficiently use the video material after surgery, the video content needs to be analyzed automatically. Important semantic content to be analyzed and indexed in these short videos are operation instruments, since they provide an indication of the corresponding operation phase and surgical action. Related work has already shown that it is possible to accurately detect instruments in cataract surgery videos. However, their underlying dataset (from the CATARACTS challenge) has very good visual quality, which is not reflecting the typical quality of videos acquired in general hospitals. In this paper, we therefore analyze the generalization performance of deep learning models for instrument recognition in terms of dataset change. More precisely, we trained such models as ResNet-50, Inception v3 and NASNet Mobile using a dataset of high visual quality (CATARACT) and test it on another dataset with low visual quality (Cataract-101), and vice versa. Our results show that the generalizability is rather low in general, but clearly worse for the model trained on the high-quality dataset. Another important observation is the fact that the trained models are able to detect similar instruments in the other dataset even if their appearance is different.

The paper “GLENDA: Gynecologic Laparoscopy Endometriosis Dataset” has been accepted

The paper “GLENDA: Gynecologic Laparoscopy Endometriosis Dataset” has been accepted for publication at the Multimedia Datasets for Repeatable Experimentation (MDRE) special session, co-located at the 26th International Conference on Multimedia Modeling, MMM 2020 to be held at Daejon, Korea (January 5-8, 2020).

Authors: Andreas Leibetseder, Sabrina Kletz, Klaus Schoeffmann (Alpen-Adria Universität Klagenfurt), Simon Keckstein (Ludwig-Maximilians-University Munich), Jörg Keckstein (Ulm University)

Abstract: Gynecologic laparoscopy as a type of minimally invasive surgery (MIS) is performed via a live feed of a patient’s abdomen surveying the insertion and handling of various instruments for conducting treatment. Adopting this kind of surgical intervention not only facilitates a great variety of treatments, the possibility of recording said video streams is as well essential for numerous post-surgical activities, such as treatment planning, case documentation and education. Nonetheless, the process of manually analyzing surgical recordings, as it is carried out in current practice, usually proves tediously time-consuming. In order to improve upon this situation, more sophisticated computer vision as well as machine learning approaches are actively developed. Since most of such approaches heavily rely on sample data, which especially in the medical field is only sparsely available, with this work we publish the Gynecologic Laparoscopy ENdometriosis DAtaset (GLENDA) – an image dataset containing region-based annotations of a common medical condition named endometriosis, i.e. the dislocation of uterine-like tissue. The dataset is the first of its kind and it has been created in collaboration with leading medical experts in the field.

Keywords: lesion detection, endometriosis localization, medical dataset, region-based annotations, gynecologic laparoscopy

Acknowledgement: This work was funded by the FWF Austrian Science Fund under grant P 32010-N38.

Andreas Leibetseder @ Florida Atlantic University


Within the scope of the AAU’s young Scientists Mentoring Programme, Andreas Leibetseder is visiting his mentor Oge Marques, Professor at the Department of Computer and Electrical Engineering and Computer Science and Florida Atlantic University (FAU) in Boca Raton, Florida. During his stay over the course of April, he intends to advance in his PhD studies by focusing on the sub-topic of region-based Endometriosis classification in laparoscopic media. He intends to approach this problem by adapting and applying deep learning technologies, profiting from the knowledge and insights of his mentor as well as other students of the local mlab research group ( Currently, several work packages have been defined, which include investigating lesion detection approaches on radiological image datasets for application in the endoscopic domain.

8th Video Browser Showdown (VBS) at MMM2019 in Thessaloniki


Last week, Klaus Schoeffmann co-organized the 8th Video Browser Showdown (VBS) at MMM2019 in Thessaloniki, and it was a great success. For the first time they used the V3C1 dataset (Part 1 of the Vimeo Creative Commons Collection), which consists of 7475 video files that amount for about 1000 hours of content. The six participating teams (including an ITEC team with Andreas Leibetseder) could solve all visual and textual Known-Item Search (KIS) tasks, as well as all Ad-Hoc Video Search (AVS) tasks within a short amount of time! The teams have clearly demonstrated that their sophisticated video retrieval systems are very powerful and allow fast and effective content-based search in videos. They look forward to the next VBS in January 2020 in Daejeon, Korea at MMM2020! More information here:

Sabrina Kletz

Sabrina Kletz @ ACM Multimedia Conference 2018, Seoul

On Reducing Effort in Evaluating Laparoscopic Skills

Abstract: Training and evaluation of laparoscopic skills have become an important aspect of young surgeons’ education. The evaluation process is currently performed manually by experienced surgeons through reviewing video recordings of laparoscopic procedures for detecting technical errors using conventional video players and specific pen and paper rating schemes. The problem is, that the manual review process is time-consuming and exhausting, but nevertheless necessary to support young surgeons in their educational training. Motivated by the need to reduce the effort in evaluating laparoscopic skills, this PhD project aims at investigating state-of-the-art content analysis approaches for finding error-prone video sections in surgery videos. In this proposal, the focus specifically lies on performance assessment in gynecologic laparoscopy using the Generic Error Rating Tool (GERT).

Conference: 2018 ACM Multimedia Conference, October 22–26, 2018, Seoul, Republic of Korea

Track: Doctoral Symposium

Mathias Lux

Talk on Indexing of Large Data Sets in Image Retrieval at Simula Metropolitan, Oslo, NO


Mathias Lux was invited to give a talk at Simula Metropolitan, a joint research center of SIMULA research labs and Oslo Metropolitan University. Besides the talk he took the opportunity to work for two days with the people at SIMULA and talk about future and ongoing projects. Read more

Sabrina Kletz

Sabrina Kletz presented her poster “On Reducing Effort in Evaluatiing Laparoscopic Skills” at Machine Learning Summer School (MLSS 2018), Buenos Aires, Argentinia.


Abstract: Training and evaluation of laparoscopic skills have become an important aspect of young surgeons’ education. The evaluation process is currently performed manually by experienced surgeons through reviewing video recordings of laparoscopic procedures for detecting technical errors using conventional video players and specific pen and paper rating schemes. The problem is, that the manual review process is time-consuming and exhausting, but nevertheless necessary to support young surgeons in their educational training. Motivated by the need to reduce the effort in evaluating laparoscopic skills, we investigate state-of-the-art content analysis approaches for finding error-prone video sections.


Print of the Poster

Seven ITEC-Members participate and present their papers/posters at MMSys2018 June 12-15, 2018


ACM MMSys 2018: Multi-Codec DASH Dataset

Authors: Anatoliy Zabrovskiy (Petrozavodsk State University & Alpen-Adria-Universität Klagenfurt), Christian Feldmann (Bitmovin Inc.), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt & Bitmovin Inc.)

Abstract: The number of bandwidth-hungry applications and services is constantly growing. HTTP adaptive streaming of audio-visual content accounts for the majority of today’s internet traffic. Although the internet bandwidth increases also constantly, audio-visual compression technology is inevitable and we are currently facing the challenge to be confronted with multiple video codecs. This paper proposes a multi-codec DASH dataset comprising AVC, HEVC, VP9, and AV1 in order to enable interoperability testing and streaming experiments for the efficient usage of these codecs under various conditions. We adopt state of the art encoding and packaging options and also provide basic quality metrics along with the DASH segments. Additionally, we briefly introduce a multi-codec DASH scheme and possible usage scenarios. Finally, we provide a preliminary evaluation of the encoding efficiency in the context of HTTP adaptive streaming services and applications.

Packet Video 2018: Investigation of YouTube regarding Content Provisioning for HTTP Adaptive Streaming

Authors: Armin Trattnig (Bitmovin Inc.), Christian Timmerer (Alpen-Adria-Universität Klagenfurt / Bitmovin Inc.), and Christopher Mueller (Bitmovin Inc.)

Abstract: About 300 hours of video are uploaded to YouTube every minute. The main technology to delivery YouTube content to various clients is HTTP adaptive streaming and the majority of today’s internet traffic comprises streaming audio and video. In this paper, we investigate content provisioning for HTTP adaptive streaming under predefined aspects representing content features and upload characteristics as well and apply it to YouTube. Additionally, we compare the YouTube’s content upload and processing functions with a commercially available video encoding service. The results reveal insights into YouTube’s content upload and processing functions and the methodology can be applied to similar services. All experiments conducted within the paper allow for reproducibility thanks to the usage of open source tools, publicly available datasets, and scripts used to conduct the experiments on virtual machines.

Packet Video 2018: Dynamic Adaptive Point Cloud Streaming

Authors: Mohammad Hosseini (University of Illinois at Urbana-Champaign (UIUC)) and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Bitmovin Inc.)

Abstract: High-quality point clouds have recently gained interest as an emerging form of representing immersive 3D graphics. Unfortunately, these 3D media are bulky and severely bandwidth intensive, which makes it difficult for streaming to resource-limited and mobile devices. This has called researchers to propose efficient and adaptive approaches for streaming of high-quality point clouds. In this paper, we run a pilot study towards dynamic adaptive point cloud streaming, and extend the concept of dynamic adaptive streaming over HTTP (DASH) towards DASH-PC, a dynamic adaptive bandwidth-efficient and view-aware point cloud streaming system. DASH-PC can tackle the huge bandwidth demands of dense point cloud streaming while at the same time can semantically link to human visual acuity to maintain high visual quality when needed. In order to describe the various quality representations, we propose multiple thinning approaches to spatially sub-sample point clouds in the 3D space, and design a DASH Media Presentation Description manifest specific for point cloud streaming. Our initial evaluations show that we can achieve significant bandwidth and performance improvement on dense point cloud streaming with minor negative quality impacts compared to the baseline scenario when no adaptations is applied.

A Network Traffic and Player Movement Model to Improve Networking for Competitive Online Games

Authors: Philipp Moll, Mathias Lux, Sebastian Theuermann, Hermann Hellwagner
Abstract: The popularity of computer games is enormously high and is still growing every year. Despite the popularity of gaming, the networking part of computer games relies on decade old technologies, which have never been intended to be used for low latency communication and are often the cause for overloaded and crashing game servers during peak hours. In order to improve the current state-of-the-art technologies, research in the networking field has to be conducted, but is challenging due to the low availability of up-to-date datasets and network traces. Modern networking solutions of computer games try to take the players’ activities as well as geographical closeness of different players in the virtual world into account, in order to achieve a high user satisfaction while keeping the network activity as low as possible. In this paper, we analyze the Battle Royale game mode of Fortnite as an example for a popular online game with demanding technical requirements with respect to networking. Based on the results of our analysis, we extrapolate player movement patterns as well as network traces, which can be used to study how to improve our current networking technology for online gaming, and to investigate possibilites to replace it by novel networking solutions, such as information-centric networking.

Workshop: NetGames 2018 (

Video Dataset of 101 Cataract Surgeries

Authors: Klaus Schoeffmann, Mario Taschwer, Stephanie Sarny, Bernd Münzer, Jürgen Primus, Doris Putzgruber
Abstract: Cataract surgery is one of the most frequently performed microscopic surgeries in the field of ophthalmology. The goal behind this kind of surgery is to replace the human eye lense with an artificial one, an intervention that is often required due to aging. The entire surgery is performed under microscopy, but co-mounted cameras allow to record and archive the procedure. Currently, the recorded videos are used in a postoperative manner for documentation and training. An additional benefit of recording cataract videos is that they enable video analytics (i.e., manual and/or automatic video content analysis) to investigate medically relevant research questions (e.g., the cause of complications). This, however, necessitates a medical multimedia information system trained and evaluated on existing data, which is currently not publicly available. In this work we provide a public video dataset of 101 cataract surgeries that were performed by four different surgeons over a period of 9 months. These surgeons are grouped into moderately experienced and highly experienced surgeons (assistant vs. senior physicians), providing the basis for experience-based video analytics. All videos have been annotated with quasi-standardized operation phases by a senior ophthalmic surgeon.
Links: Preprint of the Paper

OVID – Relevanzerkennung in Augenoperationsvideos

OVID Relevance Detection in Ophthalmic Surgery Videos
Projektpartner Klinikum Klagenfurt (KABEG)
Ressourcen 3 Doktoranden für 3 Jahre, 1 Studienassistenten für 1,25 Jahre

Informatiker und Mediziner arbeiten in einem fachübergreifenden Forschungsprojekt mit Informatikschwerpunkt zusammen, in dem Methoden zur automatischen Erkennung von relevanten zeitlichen Segmenten in Augenoperationsvideos entwickelt und evaluiert werden sollen. Hauptziel ist dabei die Modellierung von Relevanz hinsichtlich der Verwendung von Videosegmenten für medizinische Lehre, Forschung und Dokumentation. Relevanzmodelle werden durch maschinelle Lernverfahren automatisch gelernt, wobei von Chirurgen annotierte Operationsvideos als Trainingsdaten dienen. Read more