Information Technology Reference
In-Depth Information
Problem
Dimensions
Area
Time
Shifting of
n
-vector
1D
O
(
n
)
O
(
n
)
O
(
√
n
)
2D
O
(
n
)
Summing
n
items
1D
O
(
n
)
O
(
n
)
O
(
√
n
)
2D
O
(
n
)
Broadcasting to
n
locations
O
(
n
)
O
(
n
)
1D
O
(
√
n
)
O
(
n
)
2D
n
-point FFT
O
(
n
)
O
(
n
)
1D
2D
O
(
n
)
O
(
√
n
)
Figure 12.3
Area vs. time performance of VLSI algorithms for four problems.
n
matrix-
matrix multiplication each require area
A
and time
T
satisfying
AT
2
=Ω(
n
2
)
.Consequently,
the 2D algorithms cited above for these problems are optimal to within a constant factor.
In the next section we now show that every normal algorithm can be implemented
on
the cube-connected cycles (CCC) network in time
T
satisfying
Ω(log
n
)
In Section
12.6
we show that shifting of an
n
-vector, the
n
-point FFT, and
n
×
O
(
√
n
)
and that the CCC network can be embedded in the plane using area
A
=
O
(
n
2
/T
2
)
.In
Theorems
12.7.2
and
12.7.3
we show that these implementations are optimal up to constant
multiplicative factors with respect to area and time for the three problems mentioned above.
≤
T
≤
12.5.3 Layout of the CCC Network
In Section
7.7.6
we describe the realization of a fully normal algorithm on the canonical CCC
network. The realization extends directly from the canonical CCC network to a general
(
k
,
d
)
-
CCC network in which there are 2
d
cycles and 2
k
vertices on each cycle. (See Fig.
12.4
.)
A fully normal algorithm is simulated on the CCC network by giving the processors on
the
j
th cycle, 0
2
d
1, the addresses
i
+
j
2
k
where 0
2
k
≤
j
≤
−
≤
i
≤
−
1. The cycles
are treated as 1D arrays and used to simulate a normal algorithm on the first
k
dimensions
exactlyasisdoneinSection
7.7.6
. These simulations are done in parallel after which the
swaps across the higher-order
d
dimensions are simulated by first rotating the leading element
on each cycle to the first of the inter-cycle edges. After executing one swap, each cycle is
advanced one step so that the second elements on each cycle are aligned with the first of the
high-order dimensions. At this point the first elements on each cycle are aligned with the edge
associated with the second of the high-order dimensions. Thus, while swaps are done between
the second elements on each cycle across the first of the high-order dimensions, swaps occur
between leading elements along the second of the high-order dimensions. This rotating and
swapping is done until all cycle elements have been swapped across all high-order dimensions.
This algorithm performs
O
(
2
k
)
steps on the cycles to perform swaps across low-order
dimensions and align the cycles for swaps at higher dimensions. An additional
O
(
d
)
steps are
used to perform swaps on the
d
high-order dimensions. Thus, the number of steps used by
this algorithm,
T
, satisfies
T
=
O
(
2
k
+
d
)
. The number of processors used in
(
k
,
d
)
-CCC
network,
n
, satisfies
n
=
2
d
+
k
.
Search WWH ::
Custom Search