Information Technology Reference
In-Depth Information
2 d− 1 positions, which is done by moving them across the largest dimension of the hypercube,
advances them to positions in the sequence that cannot be occupied by any other messages,
even after these messages have been advanced by their full gaps. For example, shown below are
the positions of the messages given above after those whose gaps contain 8 and 4 have been
moved by this many positions:
dest i 1 2 4 5 7 11 13 15
i 012345678 9 0 1 2 3 5
Repeating this argument on subsequent smaller powers of 2, we find that no two messages
that are routed in a given stage occupy the same site. As a consequence, after D stages, each
taking d steps, all messages are routed. We summarize this result below.
THEOREM 7.9.4 Each computation cycle of a p -processor EREW PRAM can be simulated by a
normal algorithm on a p -vertex hypercube in O ( D log p +log 2 p ) steps, where D is the maximum
number of processors accessing memory locations stored at a given vertex of the hypercube.
This result can be improved to O (log p ) [ 158 ] with a probabilistic algorithm that replicates
each datum at each hypercube processor a fixed number of times.
Because the simulation described above of a EREW PRAM on a hyperc u be consists of a
fixed number of normal steps and fully normal sequences of steps, O ( D p ) -and O ( Dp ) -
time simulations of a PRAM on two-dimensional meshes and linear arrays follow. (See Prob-
lems 7.32 and 7.33 .)
7.9.4 Circuits and the CREW PRAM
Algebraic and logic circuits can also be simulated on PRAMs, in particular the CREW PRAM.
For simplicity we assign one processor to each vertex of a circuit (a gate). We also assume that
each vertex has bounded fan-in, which for concreteness is assumed to be 2. We also reserve one
memory location for each gate and one for each input variable. Each processor now alternates
between reading values from its two inputs (concurrently with other processors, if necessary)
and exclusively writing values to the location reserved for its value. Two steps are devoted to
reading the values of gate inputs. Let D Ω ( f ) be the depth of the circuit for a function f .After
2 D Ω ( f ) steps the input values have propagated to the output gates, the values computed by
them are correct and the computation is complete.
In Section 8.14 we show a stronger result, that CREW PRAMs and circuits are equivalent
as language recognizers. We also explore the parallel computation thesis, which states that
sequential space and parallel time are polynomially related. It follows that the PRAM and the
logic circuit are both excellent models in terms of which to measure the minimal computation
time required for a problem on a parallel machine. In Section 8.15 we exhibit complexity
classes, that is, classes of languages defined in terms of the depth of circuits recognizing them.
7.10 The BSP and LogP Models
Bulk synchronous parallelism (BSP) extends the MIMD model to potentially different asyn-
chronous programs running on the physical processors of a parallel computer. Its developers
believe that the BSP model is both built on realistic assumptions and sufficiently simple to
provide an attractive model for programming parallel computers. They expect it will play a
 
Search WWH ::




Custom Search