Hardware Reference
In-Depth Information
A00
A01
A03
A04
B00
B01
B02
B03
C00
C01
C02
C03
A10
A11
A12
A13
B10
B11
B12
B13
C10
C11
C12
C13
A20
A21
A22
A23
B20
B21
B22
B23
C20
C21
C22
C23
A30
A31
A32
A33
B30
B31
B32
B33
C30
C31
C32
C33
Fig. 8.9 Parallel multiplication of matrices
data caches. The system metrics associated with the architecture are the following:
cycles, instructions, power dissipation, peak power dissipation and area.
Application The application used for the following experimental results is a classic
implementation of partitioned matrix multiplication. Each matrix (both sources and
destination) is divided into an equal number of sub-matrices; each node works by
multiplying the rows and columns needed for a single partition of the destination
matrix. Figure 8.9 shows an example matrix multiplication ( C
=
A
×
B ) for two
squared matrices divided into 16 sub-matrices ( A i , j , B i , j , C i , j where i , j
∈{
0, 3
}
).
8.3.5
Response Surface Modeling of Many-Cores
This section presents the results of the validation of RSMs described in Chap. 4.
For the sake of synthesis, in this section the results related to the following RSM
configurations are reported:
￿
Linear regression
-
Model order: first ,
-
Without any interaction between parameters ,
-
Excluding the following design space parameters from metric estimation:
ICache_ways ,
DCache_ways ,
L2Cache_ways ,
L2Cache_access_latency ,
Memory_size ,
Memory_access_latency .
￿
Radial Basis Functions
-
Distance function definition: thin plate spline .
￿
Splines
-
No parameters.
￿
Kriging
 
Search WWH ::




Custom Search