MBSPDiscover: An Automatic Benchmark for MultiBSP Performance Analysis - High Performance Computing

Information Technology Reference

In-Depth Information

22

if (master) {

23

times.append(t*rate/NITERS);

24

}

25

}

26

level.g, level.L = leastSquares(times)

27

return (level.g, level.L)

28

}

Algorithm 1.2. coreBenchmark function.

Then a synchronization for the current level is performed (line 5) in order to

assure that all threads have the computing rate value.

The coreBenchmark function measures a full h -communication, which we de-

fine as the extension of a h -relation for the shared-memory case within a single

node. It is implemented as a communication where every core writes/reads ex-

actly h data words. We consider the worst case, measuring the slowest communi-

cation possible by cyclically reading single data words into other processors. In

that way, the values of g i and L i computed using the benchmark are pessimistic

values, and the real values will be always better. The variable h represents the

largest number of words read or written in the shared memory of the level. HMAX

is the maximum value for all h parameters used in the communications patterns

for each level. It may need to be different for different levels of the hierarchy, we

plan to find suitable values by trial and error.

The communication times using the h -communication pattern are initialized

by the initCommunicationPattern routine (line 7). This process is repeated

NITERS times (lines 10-13), because each operation is too fast to be measured

with proper precision. After that, the master thread in each level saves the flops

used for each h -communication (line 16).

Finally, the parameters g and L are computed using a traditional least squares

approximation method (line 19), to fit the data to a linear model, according to

the related works [1,6], providing an accurate approximation for g i and L i .

3.3 Methodology for the Empirical Evaluation of h -Communications

The methodology applied to measure the h -communications and then estimate

the parameters g and L is based on measuring the implementation of MultiBSP

operations . We refer to MultiBSP operations as the functions/procedures need to

implement an algorithm designed with the MultiBSP computational model. In

our software design, the MBSP operations module contains the implementation

of these functions, including operations provided by the MulticoreBSP for C

library [9]. This library establishes a methodology for programming according

to the MultiBSP computational model.

The software design shown in Fig. 3 is important here because when MultiBSP

algorithms are programmed using other libraries, it is possible to reconfigure the

tool, changing the MBSP operation module and re-characterizing the architec-

ture by running the benchmark with this new configuration.

High Performance Computing

Search WWH ::

Custom Search

Home