Speedup Breadth-First Search by Graph Ordering
Abstract:
Breadth-First Search (BFS) is a core graph algorithm that is widely used for graph analysis. As it is frequently used in many graph applications, improving the BFS performance is essential. In this paper, we present a graph ordering method that could reorder the graph nodes to achieve better data locality, thus, improving the BFS performance. Our method is based on an observation that the sibling relationships will dominate the cache access pattern during the BFS traversal. Therefore, we propose a frequency-based model to construct the graph order. First, we optimize the graph order according to the nodes’ visit frequency. Nodes with high visit frequency will be processed in priority. Second, we try to maximize the child nodes’ overlap layer by layer. As it is proved to be NP-hard, we propose a heuristic method that could greatly reduce the preprocessing overheads.We conduct extensive experiments on 16 real-world datasets. The result shows that our method could achieve comparable performance with the state-of-the-art methods while the graph ordering overheads are only about 1/15.
Keywords: Breadth-first search, BFS, graph ordering, graph algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 635References:
[1] LAW: The laboratory for web algorithmics. http://http://law.di.unimi.it.
[2] V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. Scalable graph exploration on multicore processors. In SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–11. IEEE, 2010.
[3] A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. Dbmss on a modern processor: Where does time go? In VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK, number CONF, pages 266–277, 1999.
[4] J. Arai, H. Shiokawa, T. Yamamuro, M. Onizuka, and S. Iwamura. Rabbit order: Just-in-time parallel reordering for fast graph analysis. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 22–31. IEEE, 2016.
[5] D. A. Bader and K. Madduri. Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray mta-2. In 2006 International Conference on Parallel Processing (ICPP’06), pages 523–530. IEEE, 2006.
[6] V. Balaji and B. Lucia. When is graph reordering an optimization? studying the effect of lightweight graph reordering across applications and input graphs. In 2018 IEEE International Symposium on Workload Characterization (IISWC), 2018.
[7] P. Boldi, B. Codenotti, M. Santini, and S. Vigna. Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience, 34(8):711–726, 2004.
[8] P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Proceedings of the 20th international conference on World wide web, pages 587–596, 2011.
[9] J. Cheng, Z. Shang, H. Cheng, H. Wang, and J. X. Yu. K-reach: who is in your small world. arXiv preprint arXiv:1208.0090, 2012.
[10] B. V. Cherkassky, A. V. Goldberg, and T. Radzik. Shortest paths algorithms: Theory and experimental evaluation. Mathematical programming, 73(2):129–174, 1996.
[11] J. Chhugani, N. Satish, C. Kim, J. Sewall, and P. Dubey. Fast and efficient graph traversal algorithm for cpus: Maximizing single-node efficiency. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pages 378–389. IEEE, 2012.
[12] F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 219–228, 2009.
[13] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT press, 2009.
[14] E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th national conference, pages 157–172, 1969.
[15] S. Dolev, Y. Elovici, and R. Puzis. Routing betweenness centrality. Journal of the ACM (JACM), 57(4):1–27, 2010.
[16] M. Frasca, K. Madduri, and P. Raghavan. Numa-aware graph mining techniques for performance and energy efficiency. In SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pages 1–11. IEEE, 2012.
[17] A. George. Nested dissection of a regular finite element mesh. SIAM Journal on Numerical Analysis, 10(2):345–363, 1973.
[18] J. A. George. Computer implementation of the finite element method. Technical report, STANFORD UNIV CA DEPT OF COMPUTER SCIENCE, 1971.
[19] S. Hong, T. Oguntebi, and K. Olukotun. Efficient parallel graph exploration on multi-core cpu and gpu. In 2011 International Conference on Parallel Architectures and Compilation Techniques, pages 78–88. IEEE, 2011.
[20] K. I. Karantasis, A. Lenharth, D. Nguyen, M. J. Garzaran, and K. Pingali. Parallelization of reordering algorithms for bandwidth and wavefront reduction. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 921–932. IEEE, 2014.
[21] J. Kunegis. The koblenz network collection. http://konect.uni-koblenz. de, 2017.
[22] K. Lakhotia, S. Singapura, R. Kannan, and V. Prasanna. Recall: Reordered cache aware locality based graph processing. In 2017 IEEE 24th International Conference on High Performance Computing (HiPC), pages 273–282. IEEE, 2017.
[23] D. LaSalle and G. Karypis. Multi-threaded graph partitioning. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pages 225–236. IEEE, 2013.
[24] J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[25] Y. Lim, U. Kang, and C. Faloutsos. Slashburn: Graph compression and mining beyond caveman communities. IEEE Transactions on Knowledge and Data Engineering, 26(12):3077–3089, 2014.
[26] K. Madduri, D. Ediger, K. Jiang, D. A. Bader, and D. Chavarria-Miranda. A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets. In 2009 IEEE International Symposium on Parallel & Distributed Processing, pages 1–8. IEEE, 2009.
[27] J. Malicevic, B. Lepers, and W. Zwaenepoel. Everything you always wanted to know about multicore graph processing but were afraid to ask. In 2017 {USENIX} Annual Technical Conference ({USENIX}{ATC} 17), pages 631–643, 2017.
[28] A. McLaughlin and D. A. Bader. Scalable and high performance betweenness centrality on the gpu. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 572–583. IEEE, 2014.
[29] V. Ufimtsev and S. Bhowmick. Application of group testing in identifying high betweenness centrality vertices in complex networks. In Eleventh Workshop on Mining and Learning with Graphs, pages 1–8, 2013.
[30] H. Wei, J. X. Yu, C. Lu, and X. Lin. Speedup graph processing by graph ordering. In Proceedings of the 2016 International Conference on Management of Data, pages 1813–1828, 2016.
[31] Y. Zhang, V. Kiriansky, C. Mendis, S. Amarasinghe, and M. Zaharia. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data), pages 293–302. IEEE, 2017.