Real-Time Analysis of Human Body Parts and Gesture-Activity Recognition in 3D - 3D Modeling and Animation: Synthesis and Analysis Techniques for the Human Body

Game Development Reference

In-Depth Information

input frame size is not large enough to make those additional costs ignorable, we

convert this intra-frame data independency into instruction-level parallelism,

which can be explored by VLIW or superscalar architecture processors. The

instruction-level parallelism can be explicitly expressed in an executable file,

since the parallelism is available during the compile-time. Both VLIW and

superscalar processors can exploit static instruction-level parallelism. Superscalar

processors use hardware schemes to discover instruction parallelism in a

program, so a superscalar processor can provide backward compatibility for old

generation processors. For this reason, most of general processors are superscalar

processors. On the other hand, a VLIW processor can achieve a similar

performance on a program with explicit parallelism by using significantly less

hardware effort with dedicated compiler support. We use the VLIW processor

to exploit the instruction-level parallelism that resulted from the intra-frame data

independency, since such parallelism can be explicitly expressed at compile time.

In the following, we will introduce our process of converting intra-frame data

independency to instruction-level parallelism. Although the target is a VLIW

processor, most parts of this process can benefit from superscalar processors,

as well.

The first step is to use loop fusion, a way of combining two similar, adjacent loops

for reducing the overhead, and loop unrolling, which partitions the loops to

discover loop-carried dependencies that may let several iterations be executed

at the same time, which increases the basic block size and thus increases

available instruction parallelism. Figure 16 shows examples of loop fusion and

unrolling.

When a loop is executed, there might be dependencies between trips. The

instructions that need to be executed in different trips cannot be executed

simultaneously. The essential idea behind loop fusion and loop unrolling is to

decrease the total number of trips needed to be executed by putting more tasks

in each trip. Loop fusion merges loops together without changing the result of the

executed program. In Figure 16, two loops are merged into one loop. This change

will increase the number of instructions in each trip. Loop unrolling merges

consecutive trips together to reduce the total trip count. In this example, the trip

count is reduced from four to two as loop unrolling is performed. These source

code transformations do not change the execution results, but increase the

number of instructions located in each loop trip and thus increase the number of

instructions that can be executed simultaneously.

Both loop fusion and loop unrolling increase basic block size by merging several

basic blocks together. While loop fusion merges basic blocks in code-domain, in

that different code segments are merged, loop unrolling merges basic blocks in

time-domain, in that different loop iterations are merged. This step increases the

code size for each loop trip. However, we do not observe significant basic block

Search WWH ::

Custom Search

Home