Parallel Computation - Models of Computation: Exploring the Power of Computing

Information Technology Reference

In-Depth Information

2 d− 1 positions, which is done by moving them across the largest dimension of the hypercube,

advances them to positions in the sequence that cannot be occupied by any other messages,

even after these messages have been advanced by their full gaps. For example, shown below are

the positions of the messages given above after those whose gaps contain 8 and 4 have been

moved by this many positions:

dest i 1 2 4 5 7 11 13 15

i 012345678 9 0 1 2 3 5

Repeating this argument on subsequent smaller powers of 2, we find that no two messages

that are routed in a given stage occupy the same site. As a consequence, after D stages, each

taking d steps, all messages are routed. We summarize this result below.

THEOREM 7.9.4 Each computation cycle of a p -processor EREW PRAM can be simulated by a

normal algorithm on a p -vertex hypercube in O ( D log p +log 2 p ) steps, where D is the maximum

number of processors accessing memory locations stored at a given vertex of the hypercube.

This result can be improved to O (log p ) [ 158 ] with a probabilistic algorithm that replicates

each datum at each hypercube processor a fixed number of times.

Because the simulation described above of a EREW PRAM on a hyperc u be consists of a

fixed number of normal steps and fully normal sequences of steps, O ( D √ p ) -and O ( Dp ) -

time simulations of a PRAM on two-dimensional meshes and linear arrays follow. (See Prob-

lems 7.32 and 7.33 .)

7.9.4 Circuits and the CREW PRAM

Algebraic and logic circuits can also be simulated on PRAMs, in particular the CREW PRAM.

For simplicity we assign one processor to each vertex of a circuit (a gate). We also assume that

each vertex has bounded fan-in, which for concreteness is assumed to be 2. We also reserve one

memory location for each gate and one for each input variable. Each processor now alternates

between reading values from its two inputs (concurrently with other processors, if necessary)

and exclusively writing values to the location reserved for its value. Two steps are devoted to

reading the values of gate inputs. Let D Ω ( f ) be the depth of the circuit for a function f .After

2 D Ω ( f ) steps the input values have propagated to the output gates, the values computed by

them are correct and the computation is complete.

In Section 8.14 we show a stronger result, that CREW PRAMs and circuits are equivalent

as language recognizers. We also explore the parallel computation thesis, which states that

sequential space and parallel time are polynomially related. It follows that the PRAM and the

logic circuit are both excellent models in terms of which to measure the minimal computation

time required for a problem on a parallel machine. In Section 8.15 we exhibit complexity

classes, that is, classes of languages defined in terms of the depth of circuits recognizing them.

7.10 The BSP and LogP Models

Bulk synchronous parallelism (BSP) extends the MIMD model to potentially different asyn-

chronous programs running on the physical processors of a parallel computer. Its developers

believe that the BSP model is both built on realistic assumptions and sufficiently simple to

provide an attractive model for programming parallel computers. They expect it will play a

Search WWH ::

Custom Search

Home