Commenced in January 2007
Paper Count: 31103
Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs
Abstract:Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO portion of the matrix is partitioned recursively into a ELL portion and a COO portion in the process of creating HYB-R format to ensure that there are as many non-zeros as possible in ELL format. The method of partitioning the matrix is an important problem for HYB-R kernel, so we also try to tune the parameters to partition the matrix for higher performance. Experimental results show that our method can get better performance than the fastest kernel (HYB) in NVIDIA-s SpMV library with as high as 17% speedup.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1057071Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1721
 E.-J. Im, "Optimizing the performance of sparse matrix-vector multiplication", PhD thesis, University of California, Berkeley, May 2000.
 R. Vuduc, " Automatic Performance Tuning of Sparse Matrix Kernels", PhD thesis, University of California, Berkeley, December 2003.
 S. Williams , "Auto-tuning Performance on Multicore Computers", PhD thesis, University of California, Berkeley, 2008.
 Sam Williams, Richard Vuduc, Leonid Oliker, John Shalf, Katherine Yelick, and James Demmel, "Optimizing sparse matrix-vector multiply on emerging multicore platforms," Journal of Parallel Computing, vol. 35, no. 3, pp.178-194, March 2009.
 Jeff Bolz, Ian Farmer, et al. , "Sparse matrix solvers on the GPU: Conjugate gradients and multigrid," In Proc. Special Interest Group on Graphics Conf. (SIGGRAPH), San Diego, CA, USA, July 2003.
 Shubhabrata Sengupta, Mark Harris, Yao Zhang, and John D. Owens, "Scan primitives for GPU computing," In Proc. ACM Dense Protein QCD Cantilever Spheres Harbor Ship Wind Tunnel SIGGRAPH/EUROGRAPHICS Symp. Graphics Hardware, San Diego, CA, USA, 2007.
 Nathan Bell and Michael Garland, "Efficient sparse matrix-vector multiplication on CUDA," In Proc. ACM/IEEE Conf. Supercomputing (SC), Portland, OR, USA, November 2009.
 Muthu Manikandan Baskaran and Rajesh Bordawekar, "Optimizing sparse matrix-vector multiplication on GPUs using compile-time and run-time strategies," Technical Report RC24704 (W0812-047), IBM T.J.Watson Research Center, Yorktown Heights, NY, USA, December 2008.
 Ali Cevahir , Akira Nukada , Satoshi Matsuoka, "Fast Conjugate Gradients with Multiple GPUs," Proceedings of the 9th International Conference on Computational Science: Part I, Baton Rouge, LA, May 25-27, 2009.
 F. Vazquez, E. M. Garzon, J.A.Martinez, and J.J.Fernandez, "The sparse matrix vector product on GPUs," Technical report, University of Almeria, June 2009.
 Monakov, A., A. Lokhmotov, and A. Avetisyan, "Automatically tuning sparse matrix-vector multiplication for GPU architectures," High Performance Embedded Architectures and Compilers, Lecture Notes in Computer Science, Vol. 5952, pp.111-125, 2010.
 J. W. Choi, A. Singh, and R. Vuduc, "Model-driven autotuning of sparse matrix-vector multiply on gpus," In PPOPP, pp.115-126, 2010.
 Ping Guo, Liqiang Wang, "Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs," The 2010 International Conference on Computational and Information Sciences, Chengdu, China.
 A. Bulu├º, J. T. Fineman, M. Frigo, J. R. Gilbert, and E. Leiserson, "Parallel sparse matrix-vector and matrixtranspose-vector multiplication using compressed sparse blocks," In SPAA, pp. 233-244, 2009.
 A. Bulu├º, et.al., "Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication," In IPDPS, 2011.
 K. Kourtis, V. Karakasis, G. Goumas, and N. Koziris, "CSX: An Extended Compression Format for SpMV on Shared Memory Systems,"In PPoPP, pp. 247-256, San Antonio, Texas, USA, 2011.
 Xintian Yang, Srinivasan Parthasarathy, P. Sadayappan, "Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining," In VLDB, 2011.
 NVIDIA CUDA (Compute Unified Device Architecture): Programming Guide, Version 2.1, December 2008.
 J. Willcock, A. Lumsdaine, "Accelerating sparse matrix computations via data compression," In ICS, Cairns, Australia, June 2006.