Information Technology Reference
In-Depth Information
Problem
Dimensions
Area
Time
Shifting of n -vector
1D
O ( n )
O ( n )
O ( n )
2D
O ( n )
Summing n items
1D
O ( n )
O ( n )
O ( n )
2D
O ( n )
Broadcasting to n locations
O ( n )
O ( n )
1D
O ( n )
O ( n )
2D
n -point FFT
O ( n )
O ( n )
1D
2D O ( n ) O ( n )
Figure 12.3 Area vs. time performance of VLSI algorithms for four problems.
n matrix-
matrix multiplication each require area A and time T satisfying AT 2 =Ω( n 2 ) .Consequently,
the 2D algorithms cited above for these problems are optimal to within a constant factor.
In the next section we now show that every normal algorithm can be implemented on
the cube-connected cycles (CCC) network in time T satisfying Ω(log n )
In Section 12.6 we show that shifting of an n -vector, the n -point FFT, and n
×
O ( n )
and that the CCC network can be embedded in the plane using area A = O ( n 2 /T 2 ) .In
Theorems 12.7.2 and 12.7.3 we show that these implementations are optimal up to constant
multiplicative factors with respect to area and time for the three problems mentioned above.
T
12.5.3 Layout of the CCC Network
In Section 7.7.6 we describe the realization of a fully normal algorithm on the canonical CCC
network. The realization extends directly from the canonical CCC network to a general ( k , d ) -
CCC network in which there are 2 d cycles and 2 k vertices on each cycle. (See Fig. 12.4 .)
A fully normal algorithm is simulated on the CCC network by giving the processors on
the j th cycle, 0
2 d
1, the addresses i + j 2 k where 0
2 k
j
i
1. The cycles
are treated as 1D arrays and used to simulate a normal algorithm on the first k dimensions
exactlyasisdoneinSection 7.7.6 . These simulations are done in parallel after which the
swaps across the higher-order d dimensions are simulated by first rotating the leading element
on each cycle to the first of the inter-cycle edges. After executing one swap, each cycle is
advanced one step so that the second elements on each cycle are aligned with the first of the
high-order dimensions. At this point the first elements on each cycle are aligned with the edge
associated with the second of the high-order dimensions. Thus, while swaps are done between
the second elements on each cycle across the first of the high-order dimensions, swaps occur
between leading elements along the second of the high-order dimensions. This rotating and
swapping is done until all cycle elements have been swapped across all high-order dimensions.
This algorithm performs O ( 2 k ) steps on the cycles to perform swaps across low-order
dimensions and align the cycles for swaps at higher dimensions. An additional O ( d ) steps are
used to perform swaps on the d high-order dimensions. Thus, the number of steps used by
this algorithm, T , satisfies T = O ( 2 k + d ) . The number of processors used in ( k , d ) -CCC
network, n , satisfies n = 2 d + k .
 
Search WWH ::




Custom Search