Digital Signal Processing Reference
In-Depth Information
a
b
Fig. 16 Architectural extension ( b ) of the 2D-systolic array ( a ) for introducing disparity level
parallelism where d m specifies the number of disparity levels processed in parallel
In order to introduce disparity level parallelism in addition to the row level
parallelism, the C-PEs and L-PEs are extended to process several consecutive
disparity levels in parallel. These groups of parallel disparity levels are processed
serially. This leads to an approximately linear increase in throughput. Further, it
is area efficient for two reasons. First, additional logic is only required for parts
of the processing units. And second, the absolute size of local buffers does not
change—only the depth-to-width ratio. This is a major advantage of disparity level
parallelism. The architectural extension for disparity level parallelism is shown in
Fig. 16 .
Boundary treatment for pixels with missing stereo overlap (i.e. x
<
d max )
(
,
)
(
,
)
significantly reduces the number of entries of the cost spaces C
p
d
, L r
p
d
,
(
,
)
S
, and, consequently, leads to a computing time reduction. For VGA images
and a disparity range of 128 px the reduction is 9
p
d
.
9% (without disparity level
parallelism).
An external interim memory is required for storing the path costs of the three non-
horizontal paths of the last row of an image slice and providing them to the first row
of the consecutive image slice. Due to the extremely regular data transfer, obeying
the FIFO-principle, and the low transfer rates, external SSRAM and SDRAM-
memories can be used. Alternatively, on-chip memory can be considered due to
the quite low absolute memory requirements.
3.7.3
Performance
Performance of the complete system and scalability of the SGM unit are analyzed
with the minimum clock frequency required to fulfill a fixed throughput constraint.
This metric, i.e. the clock frequency normalized for a fixed throughput, allows direct
and accurate comparison, and reflects the importance of performance while being
independent from varying operating clock frequencies [ 70 ] . This also models a
typical design constraint of real-world applications, where the required throughput
is usually specified by external circumstances (e.g. by the cameras, required depth
resolution, etc.). In this case, throughput-normalized metrics for clock frequency,
 
 
Search WWH ::




Custom Search