Information Technology Reference
In-Depth Information
6 Concluding Remarks
We have addressed the computation of the symmetric band matrix-matrix mul-
tiplication on CPU-GPU platforms. Exploiting the structure of the matrix yields
relevant savings in both memory and computational cost. Two specific implemen-
tations, sbmm blk and sbmm blk + ms , are presented and evaluated. Both routines
leverage the parallelism of the target architecture to deliver remarkable per-
formance. Routine sbmm blk adopts the packed storage scheme defined in BLAS
and, consequently, presents some drawbacks that limit its performance in parallel
hardware architectures. Routine sbmm blk + ms partially overcomes these problems
by relying on a modified packed storage scheme which is more suitable for the
underlying architecture, at the cost of a minor increase in the memory require-
ments. The experimental evaluation shows remarkable gains of both routines
over the naive implementations based on the kernels in MKL and CUBLAS for
this operation.
Additionally, we have developed a symmetric band matrix-vector routine,
sbmv ms , that exploits the benefits from the modified storage scheme revealed by
sbmm blk + ms . Specifically, this new routine renders higher performance than its
counterpart from CUBLAS. Although our solution requires an additional effort
to transform the matrix to the modified storage scheme, we believe it may be
useful in methods that perform several symmetric band matrix-vector products
involving the same matrix such as, e.g., iterative Krylov subspace-based solvers
for symmetric band linear systems.
Acknowledgements. Ernesto Dufrechou and Pablo Ezzatti acknowledge sup-
port from Programa de Desarrollo de las Ciencias Basicas, and Agencia Na-
cional de Investigacion e Innovacion, Uruguay. Enrique S. Quintana-Ort´ıwas
supported by project TIN2011-23283 of the Ministry of Science and Competi-
tiveness (MINECO) and EU FEDER, and project P1-1B2013-20 of the Fundacio
Caixa Castello-Bancaixa and UJI.
References
1. Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A.,
Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users'
Guide. SIAM, Philadelphia (1992)
2. Benner, P., Dufrechou, E., Ezzatti, P., Igounet, P., Quintana-Ortı, E.S., Remon, A.:
Accelerating band linear algebra operations on gPUs with application in model reduc-
tion. In: Murgante, B., Misra, S., Rocha, A.M.A.C., Torre, C., Rocha, J.G., Falcao,
M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.) ICCSA 2014, Part VI. LNCS,
vol. 8584, pp. 386-400. Springer, Heidelberg (2014)
3. Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In:
Proceedings of the 1969 24th National Conference, ACM 1969, pp. 157-172. ACM,
New York (1969)
Search WWH ::




Custom Search