Information Technology Reference
In-Depth Information
2 BSP and MultiBSP Models
To set the scope of this paper, this section describes the BSP and MultiBSP mod-
els. We start with a brief description of the flat BSP model and how it evolved
into the concept of multicore, which emphasizes hierarchies of components.
2.1 The Original BSP Model
The BSP model considers an abstract parallel computer, which is fully mod-
eled by a set of parameters: p —number of processors, s —processor speed, g
communication cost, and l —synchronization cost. Using these parameters, the
execution time of any BSP algorithm can be calculated.
In the BSP model, the computations are organized in a sequence of global
supersteps , which consist of three phases: i ) every participating processor per-
forms local computations, i.e., each process can only make use of values stored
in the local memory of the processor; ii ) the processes exchange data between
themselves to facilitate remote data storage capabilities and iii ) every partic-
ipating process must reach the next synchronization barrier, i.e., each process
waits until all other processes have reached the same barrier. Then, the next
superstep can begin.
The practical model of programming is Single Program Multiple Data (SPMD),
implemented as C/C++ program copies running on p processors, wherein com-
munication and synchronization among copies are performed using specific li-
braries such as BSPlib [4] or PUB [2]. In addition to defining an abstract machine
and imposing a structure on parallel programs, the BSP model provides a cost
function modeled by the architecture parameters.
The total running time of a BSP program can be calculated as the accumu-
lative sum of the cost of its supersteps, where the cost of each superstep is the
sum of three quantities: i ) w , the maximum number of calculations performed
by each processor; ii ) h
g ,where h is the maximum of the messages sent/re-
ceived by each processor, with each word costing g units of time; and iii ) l ,the
time cost of the barrier synchronizing the processors. The effect of the computer
architecture is included by the parameters g and l . These values, along with the
processor speed s , can be empirically determined for each parallel computer by
executing benchmark programs at installation time.
×
2.2 The New MultiBSP Model
Modern supercomputers are made of highly parallel nodes with tens of cores.
The eciency of these nodes required improvements of the memory subsystem
by adding multiple hierarchical levels of caches as well as a distributed memory
interconnect causing Non-Uniform Memory Access (NUMA). In 2010, Valiant
updated the BSP model to account for this situation, resulting in the MultiBSP
model. It was defined with the same abstractions and bridge architecture as the
original BSP, but adapted to multicore machines.
Search WWH ::




Custom Search