The manuscript “The Workflow Trace Archive: Open-Access Data from Public and Private Computing Infrastructures” has been accepted for publication in the A* ranked IEEE Transactions on Parallel and Distributed Systems (TPDS) journal.
Authors: Laurens Versluis, Roland Mathá, Sacheendra Talluri, Tim Hegeman, Radu Prodan, Ewa Deelman, and Alexandru Iosup
Abstract: Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. We focus in this work on traces of workflows—common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent, and (2) the use of realistic, open-access traces even more so. Alleviating these issues, we introduce the Workflow Trace Archive (WTA), an open-access archive of workflow traces from diverse computing infrastructures and tooling to parse, validate, and analyze traces. The WTA includes >48 million workflows captured from >10 computing infrastructures, representing a broad diversity of trace domains and characteristics. To emphasize the importance of trace diversity, we characterize the WTA contents and analyze in simulation the impact of trace diversity on experiment results. Our results indicate significant differences in characteristics, properties, and workflow structures between workload sources, domains, and fields.
Acknowledgments: This work is supported by the projects Vidi MagnaData, Commit, the European Union’s Horizon 2020 Research and Innovation Programme, grant agreement number 801091 “ASPIDE”, and the National Science Foundation award number 1664162.
The first review of the ASPIDE project took place on 25.02.2020 in the premises of the European Commission in Luxemburg. During the project review, a live demo of the platform for supporting extreme scale applications was presented and future research and developing activities were discussed with the reviewers.
We are #ARTICONF, we are people. We are federating trust.
Here is a photo of us today after our periodic review in Brussels. Thank you Peter Friess, @FraukeBehrendt, @joaomagalhaes, @gravesen for your feedback on our project of #decentralised #socialmedia ⛓️ pic.twitter.com/4ovzPJbHBj
— ARTICONF (@articonf) February 20, 2020
Bitmovin, a world leader in online video technology, is teaming up with the University of Klagenfurt, Institute of Information Technology (ITEC) and the Austrian Federal Ministry of Digital and Economic Affairs (BMDW) in a multi-million Euro research project to uncover techniques that will enhance the video streaming experiences of the future. The joint project establishes a dedicated research team to investigate potential new tools and methodologies for encoding, transport and playback of live and on-demand video using the HTTP Adaptive Streaming protocol that is widely used by online video and TV providers. The resulting findings will help empower the creation of next-generation solutions for higher quality video experiences at lower latency, while also potentially reducing storage and distribution costs.
Interview with @RaduProdanAAU at #ESMH The days when people work together in an office, are over…
In the future, ‘teams’ will be social networks… able to connect to other federated networks of clouds and at the same time establish privacy and trust. https://t.co/GwaVaQTFFJ— ARTICONF (@articonf) January 30, 2020
The ITEC team participated in the HiPeac 2020 International Workshop on Exascale programing models for extreme data with a presentation with title “Monitoring data collection and mining for Exascale systems”. The ITEC team also attended the collocated ASPIDE meeting and actively participated in the decision of the next research activities in the project.
Title of the talk: Mobility-Aware Scheduling of Extreme Data Workflows across the Computing Continuum
Abstract: The appearance of the Fog/Edge computing paradigm, as an emanation of the computing continuum closer to the edge of the network, unravels important opportunities for execution of complex business and scientific workflows near the data sources. The main characteristics of these workflows are (i) their distributed nature, (ii) the vast amount of data (in the order of petabytes) they generate and (iii) the strict latency requirements. Current workflow management approaches rely exclusively on the Cloud Data Centers, which due to their geographical distance in relation to the data sources, could negatively influence the latency and cause violation of workflow requirements. It is therefore essential to research novel concepts for partial offloading of complex workflows closer to where the data is generated, thus reducing the communication latency and the need for frequent data transfers.
In this talk we will explore the potential of the computing continuum for scheduling and partial offloading of complex workflows with strict response time requirements and expose the resource provisioning challenges related to the heterogeneity and mobility of the Fog/Edge environment. Consequently, we will discuss a novel mobility-aware Pareto-based approach for task offloading across the continuum, which considers three optimization objectives, namely response time, reliability, and financial cost. Besides, the approach introduces a Markov model to perform a single-step predictive analysis on the mobility of the Fog/Edge devices, thus constraining the task offloading optimization problem to devices that do not frequently move (roam) within the computing continuum. As a conclusion to the talk, we will discuss the efficiency of the presented approach, based on both a simulated and a real-world testbed environment tailored for a set real-world biomedical, meteorological and astronomy workflows.