Game Development Reference
In-Depth Information
Figure 18. Instruction cycles for 10 frames (left), and available parallelism
(right).
dependency to data dependency by evaluating the condition in each if-statement.
The results are put together into one single byte, where each bit represents the
result from each evaluation. We build into the algorithm a table that provides the
result for any given evaluated byte. By using this method, the branches are
eliminated and the instruction-level parallelism in the contour following algorithm
block is increased.
The results of these optimizations are shown in Figure 18. Here, operations-per-
instruction is used as a measurement for instruction-level parallelism. While
optimization towards higher instruction-level parallelism can significantly im-
prove system performance, there are still limitations. The instruction-level
parallelism is a fine-grained parallelism, which limits its ability to exploit coarse-
grained data independencies, such as inter-frame independency. From a hard-
ware point of view, the increasing global interconnection delay will prevent
processor designers from building a large amount of functional units into one
single processor, which also limits the exploration of instructional parallelism. In
addition, the recent trends show that both application specific computer systems
and general computers are starting to incorporate multiple processors. This will
provide hardware support for exploiting coarse-grained parallelisms. Consider-
ing this, we are starting to explore alternative methods.
Inter-Frame-Level Parallelism and Symmetric
Architecture
A different level of data independency in our smart camera system is the inter-
frame data independency. Since this independency lies between different input
frames, it is a coarse-grained data independency. The corresponding parallel-
Search WWH ::




Custom Search