Information Technology Reference
In-Depth Information
[4] has another benchmarking program, bspprobe , which measures optimistic g
values using larger packets insted of single words. BSP benchmarking also can
be done by using mpibench from MPIedupack[4].
The benchmarking of the MultiBSP computational model has been recently
addressed in the article by Savadi and Hossein [6], using a similar approach as
the one we apply here. The classic BSP benchmarking is used as a baseline,
but the specification of a model instance is different. Unlike the benchmarking
methodology followed in our work, the authors consider deep architecture details
such as cache coherency, for instance for propagation of values in the memory
hierarchy. In their approach, the analysis of results is made by comparing the
real values obtained by the process of benchmarking against theoretical values of
the g and L parameters, which are computed as optimistic lower bounds (i.e. the
authors suppose that the memory utilization is always lower than the cache size,
and that all cores work at maximum speed). Our approach differs since we do
not make any assumption about the underlying hardware platform but rather
hide its characterics inside the output of will chosen benchmarks. We believe this
strategy is well suited to modern architectures that are too complex for precise
models depending on their advanced, hidden and/or rarely well documented
features.
From a practical point of view, the main advantage of our proposal is to
evaluate real MultiBSP operations implemented for the library MulticoreBSP
for C [9]. In addition, our results are validated using a real MultiBSP program,
comparing the real execution time of the inner product algorithm against the
predicted running time using the theoretical MultiBSP cost function.
3 The MBSPDiscover Benchmark for MultiBSP
This section presents the design and implementation of the MBSPDiscover bench-
mark to estimate the g and L parameters that characterize a MultiBSP machine.
3.1 Motivation
Multicore architectures are widely used for HPC applications, and both the num-
ber of cores and the cache levels have been steadily increasing in the last years.
Therefore, there is a real need to identify and evaluate the different parameters
that characterize the structure of cores and memories, not only to understand
and compare different architectures, but also for using them wisely for a bet-
ter design of HPC applications. This characterization is motivated by the fact
that the performance improvements when using a multi-core processor strongly
depend on software algorithms, their implementation, and the utilization of the
hardware capabilities.
As mentioned previously, this work follows the MultiBSP model which spec-
ifies the parameters needed to characterize a multicore machine. In this model,
the performance of a parallel algorithm depends on parameters such as commu-
nication and synchronization costs, number of cores, and the size of caches.
 
Search WWH ::




Custom Search