Information Technology Reference
In-Depth Information
Algorithm: [ E ]:=sbmm BLK inner ( E,A,D,k )
,Aₒ
,Dₒ
E T
E M
E B
A TL
A ML A MR
A BR
D T
D M
D B
Partition E ₒ
where E T ,D T have 0 elements; A TL is 0 × 0and E M , A ML have k rows
while m ( E T ) <m ( E ) do
Determine block size b
Repartition
E 0
E 1
E 2
E 3
E 4
A 00
A 10
D D D D D 4
E T
E M
E B
A TL
A ML A MR
A BR
D T
D M
D B
A 11
,
A 20
A 21 A 22
A 31 A 32
A 42
,
where
D 1 has b rows;
E 1 has b rows;
E 3 has 0 rows if m ( D 0 ) > ( n ( A ) − k − 1) and has b rows otherwise;
A 11
is b × b ;
A 31
is empty if m ( D 0 ) > ( n ( A )
k
1) and is b
×
b otherwise
E 1 := E 1 + A 11 ·
D 1
E 1 := E 1 + A T
21 · D 2
E 1 := E 1 + A T
31 · D 3
E 2 := E 2 + A 21 · D 1
E 3 := E 3 + A 31 · D 1
Continue with
E 0
E 1
E 2
E 3
E 4
A 00
A 10 A 11
A 20 A 21
D D D D D 4
E T
E M
E B
A TL
A ML A MR
A BR
D T
D M
D B
,
A 22
,
A 31
A 32
A 42
endwhile
Fig. 3. Inner loop of Algorithm sbmm BLK that computes C := A · B + C
4.1
Implementation sbmm blk
Routine sbmm blk is an implementation of Algorithm sbmm BLK with all the
computations performed via the appropriate CUBLAS kernel. Initially, all the
matrices are sent totheGPU.Then, ʲC is computed via the corresponding
CUBLAS routine. This is not a BLAS-3 operation, but it presents a relatively
small computational cost and can be eciently computed on the accelerator.
Next, the operations in sbmm BLK are executed; and finally C is transferred
back to the CPU. The update of C 1 requires three matrix-matrix products: one
involving a symmetric matrix (block A 11 ), a second with an upper triangular
matrix (block A 31 ); and the last one with two general matrices. CUBLAS pro-
vides specific routines for all these operations. Blocks C 2 and C 3 are updated via
a product of two general matrices and a product of an upper triangular matrix
times a general matrix, respectively.
The use of CUBLAS routines also presents a drawback, as the kernel that
implements the product of a triangular matrix times a general matrix (routine
trmm) indeed computes
C = ʱop ( A ) B,
(3)
Search WWH ::




Custom Search