MBSPDiscover: An Automatic Benchmark for MultiBSP Performance Analysis - High Performance Computing

Information Technology Reference

In-Depth Information

2 BSP and MultiBSP Models

To set the scope of this paper, this section describes the BSP and MultiBSP mod-

els. We start with a brief description of the flat BSP model and how it evolved

into the concept of multicore, which emphasizes hierarchies of components.

2.1 The Original BSP Model

The BSP model considers an abstract parallel computer, which is fully mod-

eled by a set of parameters: p —number of processors, s —processor speed, g —

communication cost, and l —synchronization cost. Using these parameters, the

execution time of any BSP algorithm can be calculated.

In the BSP model, the computations are organized in a sequence of global

supersteps , which consist of three phases: i ) every participating processor per-

forms local computations, i.e., each process can only make use of values stored

in the local memory of the processor; ii ) the processes exchange data between

themselves to facilitate remote data storage capabilities and iii ) every partic-

ipating process must reach the next synchronization barrier, i.e., each process

waits until all other processes have reached the same barrier. Then, the next

superstep can begin.

The practical model of programming is Single Program Multiple Data (SPMD),

implemented as C/C++ program copies running on p processors, wherein com-

munication and synchronization among copies are performed using specific li-

braries such as BSPlib [4] or PUB [2]. In addition to defining an abstract machine

and imposing a structure on parallel programs, the BSP model provides a cost

function modeled by the architecture parameters.

The total running time of a BSP program can be calculated as the accumu-

lative sum of the cost of its supersteps, where the cost of each superstep is the

sum of three quantities: i ) w , the maximum number of calculations performed

by each processor; ii ) h

g ,where h is the maximum of the messages sent/re-

ceived by each processor, with each word costing g units of time; and iii ) l ,the

time cost of the barrier synchronizing the processors. The effect of the computer

architecture is included by the parameters g and l . These values, along with the

processor speed s , can be empirically determined for each parallel computer by

executing benchmark programs at installation time.

×

2.2 The New MultiBSP Model

Modern supercomputers are made of highly parallel nodes with tens of cores.

The eciency of these nodes required improvements of the memory subsystem

by adding multiple hierarchical levels of caches as well as a distributed memory

interconnect causing Non-Uniform Memory Access (NUMA). In 2010, Valiant

updated the BSP model to account for this situation, resulting in the MultiBSP

model. It was defined with the same abstractions and bridge architecture as the

original BSP, but adapted to multicore machines.

Search WWH ::

Custom Search

Home