Information Technology Reference
In-Depth Information
extends BSP in two ways: i ) it is a hierarchical model, with an arbitrary number
of components, taking into account the physical structure of multiple memory
and cache levels within single chips as well as in multi-chip architectures; and ii )
at each level, MultiBSP incorporates memory size as an additional parameter in
the model, which was not included in the original BSP.
In this line of work, the research reported in this paper is focused on solving
the problem of characterizing multicore computing architectures, which are de-
scribed by a series of parameters such as size, latency, and memory levels. When
a parallel algorithm based on the MultiBSP computational model is designed,
the programmer needs to know the value of the parameters that describe the
architecture, since the performance of the resulting algorithm depends on these
parameters. Moreover, the MultiBSP programmer needs to conceive his appli-
cation with multiple levels of abstraction that require the appropriate use of
threads, cache memories and the cores that share these caches.
The proposed benchmark has the following features: a) it computes the Multi-
BSP parameters using a bottom-up technique for discovering the architecture
and building the hierarchy levels using the MultiBSP approach and b) it is im-
plemented using the same library that implements the abstraction levels of the
application, so it measures the critical operations taking into account not only
the theoretical aspects, but also the specific implementation.
In order to develop the proposed benchmark, we address the following topics:
i ) based on the detection of the hierarchy of levels in a multicore machine,
we show how to translate the hierarchy into the components of an abstract
MultiBSP machine. ii ) we explain formally all parameters, specially focusing on
communication and synchronization costs. iii ) we introduce the concept of h -
communication, which is an adaptation of the h -relation of BSP for the specific
case of shared-memory relations within a single node.
Our benchmark is applied to characterize two High Performance Computing
(HPC) multicore machines. We also report the validation of the proposed method
by using a real MultiBSP implementation of the vector inner product algorithm
and comparing the predicted execution time against the real execution time.
The research reported in this article is developed within the project “Schedul-
ing evaluation in heterogeneous computing systems with hwloc” (SEHLOC 1 ).
The main goal consists in the development of runtime systems that allow com-
bining characteristics of the software applications and topological information
of the computational platforms, in order to get scheduling suggestions to profit
from software and hardware anities and provide a way for eciently executing
realistic applications.
The paper is organized as follows. Section 2 introduces the BSP and Multi-
BSP models, and relevant related work about BSP benchmarking. Section 3
describes the design and implementation of the MBSPDiscover benchmark. Sec-
tion 4 reports the application of the proposed benchmark for two case studies
and the validation using a real MultiBSP application. Finally, Section 5 presents
the conclusions and formulates the main lines for future work.
1 http://runtime.bordeaux.inria.fr/sehloc/
 
Search WWH ::




Custom Search