Information Technology Reference
In-Depth Information
(#3), eight components from level 2 are grouped. They share the RAM memory,
with a size of 128GB, as specified by tuple 3 =
p 3 =8 ,m 3 = 128GB ,g 3 ,L 3
.
Finally, using the same procedure we previously applied to the dell32 ar-
chitecture (i.e. joining all tuples and discarding level 0 ), we get the MultiBSP
specification in Eq. 2.
M 2 =[ p 1 =2 ,m 1 =2MB ,g 1 ,L 1 , p 2 =4 ,m 2 =6MB ,g 2 ,L 2 ,
p 3 =8 ,m 3 = 128 GB ,g 3 ,L 3 ] )
Using these instances of the MultiBSP model, we can predict the running time
of a MultiBSP algorithm executed in each machine. The g i and L i parameters
in each tuple must be previously calculated using the benchmarking procedure
explained in the previous section. Next section reports the values of g and L
obtained for both architectures at each level.
4.2 Results
We report the time to perform h -communications in each level, increasing the
number h as in the coreBenchmark function. Reporting the flops for each h -
communications is important because we compute the g i and L i using least
squares to estimate the parameters at each level.
(a) Instance #1: dell32
(b) Instance #2: jolly
Fig. 7. Time to perform from h -communications per level in a MultiBSP tree, with h
between 0 and 256
Figure 7 show the h i communications in each level for dell32 (level 1 and level 2 )
and jolly (levels 1, 2, and 3). In level 1 of dell32 , the communications are within
the shared memory (L3 cache), so they are twice faster than in level 2 ,which
use the RAM memory. For jolly , the communications in level 1 are within the
L2 cache, thus they are three times faster than in level 2 , where communications
are performed through the L3 cache. In turn, they are 1.5
faster than those in
level 3 of the hierarchy, which are performed by accessing the RAM memory.
×
 
Search WWH ::




Custom Search