Please use this identifier to cite or link to this item:
https://idr.l2.nitk.ac.in/jspui/handle/123456789/8822
Title: | Predicting an optimal sparse matrix format for SpMV computation on GPU |
Authors: | Neelima, B. Ram Mohana Reddy, Guddeti Raghavendra, P.S. |
Issue Date: | 2014 |
Citation: | Proceedings - IEEE 28th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2014, 2014, Vol., , pp.1427-1436 |
Abstract: | Many-threaded architecture based Graphics Processing Units (GPUs) are good for general purpose computations for achieving high performance. The processor has latency hiding mechanism through which it hides the memory access time in such a way that when one warp (group of 32 threads) is computing, the other warps perform memory bound access. But for memory access bound irregular applications such as Sparse Matrix Vector Multiplication (SpMV), memory access times are high and hence improving the performance of such applications on GPU is a challenging research issue. Further, optimizing SpMV time on GPU is an important task for iterative applications like jacobi and conjugate gradient. However, there is a need to consider the overheads caused while computing SpMV on GPU. Transforming the input matrix to a desired format and communicating the data from CPU to GPU are non-trivial overheads associated with SpMV computation on GPU. If the chosen format is not suitable for the given input sparse matrix then desired performance improvements cannot be achieved. Motivated by this observation, this paper proposes a method to chose an optimal sparse matrix format, focusing on the applications where CPU to GPU communication time and pre-processing time are nontrivial. The experimental results show that the predicted format by the model matches with that of the actual high performing format when total SpMV time in terms of pre-processing time, CPU to GPU communication time and SpMV computation time on GPU, is taken into account. The model predicts an optimal format for any given input sparse matrix with a very small overhead of prediction within an application. Compared to the format to achieve high performance only on GPU, our approach is more comprehensive and valuable. This paper also proposes to use a communication and pre-processing overhead optimizing sparse matrix format to be used when these overheads are non trivial. � 2014 IEEE. |
URI: | https://idr.nitk.ac.in/jspui/handle/123456789/8822 |
Appears in Collections: | 2. Conference Papers |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.