Authors: Seyedehhaleh Seyeddizaji, Joze Martin Rozanec, Reza Farahani, Dumitru Roman and Radu Prodan
Venue: The 2nd Workshop on Serverless, Extreme-Scale, and Sustainable Graph Processing Systems Co-located with ICPE 2024
Abstract: While graph sampling is key to scalable processing, little research has tried to thoroughly compare and understand how it preserves features such as degree, clustering, and distances dependent on the graph size and structural properties. This research evaluates twelve widely adopted sampling algorithms across synthetic and real datasets to assess their qualities in three metrics: degree, clustering coefficient (CC), and hop plots. We find the random jump algorithm to be an appropriate choice regarding degree and hop-plot metrics and the random node for CC metric. In addition, we interpret the algorithms’ sample quality by conducting correlation analysis with diverse graph properties. We discover eigenvector centrality and path-related features as essential features for these algorithms’ degree quality estimation, node numbers (or the size of the largest connected component) as informative features for CC quality estimation and degree entropy, edge betweenness and path-related features as meaningful features for hop-plot metric. Furthermore, with increasing graph size, most sampling algorithms produce better-quality samples under degree and hop-plot metrics.