Information Technology Reference
In-Depth Information
Table 1. Computed values for g and L parameters for the studied architectures
dell 32
jolly
level g (flops/word) L (flops)
level g (flops/word) L (flops)
2
977.5
15550.2
3
1315.9
16184.4
1
334.9
7792.9
2
549.9
7157.9
1
105.3
498.2
Finally, using the least squares method we estimate the values of g i and L i
over the h -communications for each level. The final values for dell32 and jolly
are reported in Table 1.
4.3 Validation of Results
For validating the results computed in the previous subsection, we conducted an
experiment using a real application, the vector inner product from BSPedupack
(actually the computation of the norm of a vector), described in Algorithm 1.3
in the MultiBSP programming model. We plan to extend the validation by con-
sidering a set of benchmark applications as future work.
1
innerProduct(level, vector) {
2
if (level.next == NULL ) {
3
return sequentialInnerProduct(vector);
4
} else {
5
begin_parallel_multibsp ( level.sons.length )
6
ownslice = split_vector(vector, multibsp_pid );
7
level = level.sons[ multibsp_pid ];
8
sync()
9
results = innerProduct(level, ownslice)
10
sync()
11
if (multbsp_id == master) {
12
return sequentialInnerProduct(results);
13
}
14
end_parallel_multibsp
15
}
16
}
17
MBSPTree = MBSPDiscover()
18
innerProduct(MBSPTree, data_vector)
Algorithm 1.3. Vector Inner Product.
Algorithm 1.3 applies the MultiBSP programming model recursively, crossing the
MCBSPTree obtained with MBSPDiscover in the proposed benchmark. Using
the tree structure, the data vector is split in slices for each thread at level i .
For i> 0, the data splitting is applied recursively. In level 0, a sequential inner
product algorithm is used to compute a partial result. Then, after synchronizing
all threads in each level, the result is the inner product for the whole data
vector. The master thread applies a reduction phase, combining all results using
the sequential inner product and then returns the result to the upper level.
The validation involves the following steps (applied for different vector sizes):
1. Estimate the amount of communications and synchronizations at each level,
by using hardware counters.
 
Search WWH ::




Custom Search