Digital Signal Processing Reference
In-Depth Information
=
(
−
)
=
Thus, for
P
6 ROMs are needed. The number of ROMs increases
rapidly as a function of
P
, and the total size of ROMs grows slowly rather than the
case where 1 ROM is used.
ForacasewherethePUCU index is 0000, the default ROM that provides
PE CU
3
and PE CU
4
is used even though no connection will be made by the
Exchange Controller. A case for 1111, three ROMs are used but not at the same
time. When one ROM is used, the size of ROM becomes 2
4
(
3
+
2
)
(1M entries). On
the other hand, when 6 ROMs are used, the total size of each ROM becomes 2
2
(
3
+
2
)
(10K entries).
4,
4
1
!
4
Architecture Evaluation
4.1
Guaranteeing Complete Redistribution
As we have discussed in the previous section, as long as we can maintain
N
RB
i
+
N
TB
i
−
1
+
N
TPR
i
for all PE CU
i
, we can guarantee that each PE will
get enough particles for the next iteration. During internal particle balancing, a
replication factor of any particle may have a large value. This replication factor
is modified whenever a particle is transferred (i.e., reduced by the splitting operator)
so that the replication factor becomes small enough to balance the particles among
the PE CU
i
s.
In addition to the number of particles requirement, the deadlock of internal
particle balancing must be avoided. This occurs when all the PE CU
i
s become
sources (i.e., they have particles to send out) or destinations (i.e., they all need
particles). This deadlock is completely eliminated by a priority decoder, which
decides the appropriate source and destination pair during the internal particle
balancing. Thus, as long as the resampling provides a total number of replicated
particles (
R
N
RT B
ui
≥
T
) larger than
M
, which is guaranteed by the quantization scheme, a
perfect redistribution is guaranteed.
+
4.2
Scalability of the Architecture
Thus far, we implicitly assume that the number of PEs is a power of two. However,
the architecture is perfectly scalable in a sense that the execution time is
M
P
without overhead for any number of PEs as long as there are enough busses available
between the PEs and the CU. The only condition, which is a function of the number
of PEs (i.e.,
P
), is the number of split operations required at the PE and the CU
interface. The number of split operators required is given by
/
log
2
P
S
=
(9)