Distributed and Parallel Systems
HiPEAC magazine https://www.hipeac.net/news/#/magazine/
HiPEACINFO 68, pages 27-28.
Autohrs: Dragi Kimovski (Alpen-Adria-Universität Klagenfurt, Austria), Narges Mehran (Alpen-Adria-Universität Klagenfurt, Austria), Radu Prodan (Alpen-Adria-Universität Klagenfurt, Austria), Souvik Sengupta (iExec Blockchain Tech, France), Anthony Simonet-Boulgone (iExec Blockchain Tech, France), Ioannis Plakas (UBITECH, Greece) , Giannis Ledakis (UBITECH, Greece) and Dumitru Roman (University of Oslo and SINTEF AS, Norway)
Abstract: Modern big-data pipeline applications, such as machine learning, encompass complex workflows for real-time data gathering, storage and analysis. Big-data pipelines often have conflicting requirements, such as low communication latency and high computational speed. These require different kinds of computing resource, from cloud to edge, distributed across multiple geographical locations – in other words, the computing continuum. The Horizon 2020 DataCloud project is creating a novel paradigm for big-data pipeline processing over the computing continuum, covering the complete lifecycle of bigdata pipelines. To overcome the runtime challenges associated with automating big-data pipeline processing on the computing continuum, we’ve created the DataCloud architecture. By separating the discovery, definition, and simulation of big-data pipelines from runtime execution, this architecture empowers domain experts with little infrastructure or software knowledge to take an active part in defining big-data pipelines.
This work received funding from the DataCloud European Union’s Horizon 2020 research and innovation programme under grant agreement no. 101016835.
Authors: Akif Quddus Khan, Nikolay Nikolov, Mihhail Matskin,Radu Prodan, Dumitru Roman, Bekir Sahin, Christoph Bussler, Ahmet Soylu
Abstract: Big data pipelines are developed to process data characterized by one or more of the three big data features, commonly known as the three Vs (volume, velocity, and variety), through a series of steps (e.g., extract, transform, and move), making the ground work for the use of advanced analytics and ML/AI techniques. Computing continuum (i.e., cloud/fog/edge) allows access to virtually infinite amount of resources, where data pipelines could be executed at scale; however, the implementation of data pipelines on the continuum is a complex task that needs to take computing resources, data transmission channels, triggers, data transfer methods, integration of message queues, etc., into account. The task becomes even more challenging when data storage is considered as part of the data pipelines. Local storage is expensive, hard to maintain, and comes with several challenges (e.g., data availability, data security, and backup). The use of cloud storage, i.e., storage-as-a-service (StaaS), instead of local storage has the potential of providing more flexibility in terms of scalability, fault tolerance, and availability. In this article, we propose a generic approach to integrate StaaS with data pipelines, i.e., computation on an on-premise server or on a specific cloud, but integration with StaaS, and develop a ranking method for available storage options based on five key parameters: cost, proximity, network performance, server-side encryption, and user weights/preferences. The evaluation carried out demonstrates the effectiveness of the proposed approach in terms of data transfer performance, utility of the individual parameters, and feasibility of dynamic selection of a storage option based on four primary user scenarios.
Radu Prodan participated in the panel on “Fueling Industrial AI with Data Pipelines” at presented the Graph-Massivizer project at the European Big Data Value Forum on November 22 in Prague, Czech Republic.
Athors: Alexander Lercher, Nishant Saurabh, Radu Prodan
The 15th IEEE International Conference on Social Computing and Networking
Abstract: Community evolution prediction enables business-driven social networks to detect customer groups modeled as communities based on similar interests by splitting them into temporal segments and utilizing ML classification to predict their structural changes. Unfortunately, existing methods overlook business contexts and focus on analyzing customer activities, raising privacy concerns. This paper proposes a novel method for community evolution prediction that applies a context-aware approach to identify future changes in community structures through three complementary features. Firstly, it models business events as transactions, splits them into explicit contexts, and detects contextualized communities for multiple time windows. Secondly, it %it performs feature engineering by uses novel structural metrics representing temporal features of contextualized communities. Thirdly, it uses extracted features to train ML classifiers and predict the community evolution in the same context and other dependent contexts. Experimental results on two real-world data sets reveal that traditional ML classifiers using the context-aware approach can predict community evolution with up to three times higher accuracy, precision, recall, and F1-score than other baseline classification methods (i.e., majority class, persistence).
2022 IEEE/ACM 2nd Workshop on Distributed Machine Learning for the Intelligent Computing Continuum (DML-ICC) In conjuction with IEEE/ACM UCC 2022 December 6-9, 2022 | Vancouver, Washington, USA
Authors: Narges Mehran (Alpen-Adria-Universität Klagenfurt) and Radu Prodan (Alpen-Adria-Universität Klagenfurt)
Abstract: Processing rapidly growing data encompasses complex workflows that utilize the Cloud for high-performance computing and the Fog and Edge devices for low-latency communication. For example, autonomous driving applications require inspection, recognition, and classification of road signs for safety inspection assessments, especially on crowded roads. Such applications are among the famous research and industrial exploration topics in computer vision and machine learning. In this work, we design a road sign inspection workflow consisting of 1) encoding and framing tasks of video streams captured by camera sensors embedded in the vehicles, and 2) convolutional neural network (CNN) training and inference models for accurate visual object recognition. We explore a matching theoretic algorithm named CODA  to place the workflow on the computing continuum, targeting the workflow processing time, data transfer intensity, and energy consumption as objectives. Evaluation results on a real computing continuum testbed federated among four Cloud, Fog, and Edge providers reveal that CODA achieves 50%-60% lower completion time, 33%-59% lower CO2 emissions, and 19%-45% lower data transfer intensity compared to two stateof-the-art methods.
OTEC: An Optimized Transcoding Task Scheduler for Cloud and Fog Environments
Samira Afzal (Alpen-Adria-Universität Klagenfurt), Farzad Tashtarian (Alpen-Adria-Universität Klagenfurt), Hamid Hadian (Alpen-Adria-Universität Klagenfurt), Alireza Erfanian (Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Alpen-Adria-Universität Klagenfurt), and Radu Prodan (Alpen-Adria-Universität Klagenfurt)
Encoding and transcoding videos into multiple codecs and representations is a significant challenge that requires seconds or even days on high-performance computers depending on many technical characteristics, such as video complexity or encoding parameters. Cloud computing offering on-demand computing resources optimized to meet the needs of customers and their budgets is a promising technology for accelerating dynamic transcoding workloads. In this work, we propose OTEC, a novel multi-objective optimization method based on the mixed-integer linear programming model to optimize the computing instance selection for transcoding processes. OTEC determines the type and number of cloud and fog resource instances for video encoding and transcoding tasks with optimized computation cost and time. We evaluated OTEC on AWS EC2 and Exoscale instances for various administrator priorities, the number of encoded video segments, and segment transcoding times. The results show that OTEC can achieve appropriate resource selections and satisfy the administrator’s priorities in terms of time and cost minimization.
OTEC architecture overview.
As a Hipeac member, we are hosting Zeinab Bakhshi, a Ph.D. student from Mälardalens University in Sweden. Zeinab achieved a Hipeac collaboration grant and is now hosted by Profesor Radu Prodan to expand her research on container-based fog architectures. Taking advantage of the multi-layer continuum computing architecture in Klagenfurt lab helps Zeinab deploy the use case she is researching on. These scientific experiments take her research work to the next level. We are planning to publish our collaborative research work in a series of papers based on the upcoming results.