Two-Level Constraint Solver and Pipelined Local Batching for Rigid Body Simulation on GPUs - GPU Pro: Advanced Rendering Techniques

Graphics Reference

In-Depth Information

The proposed pipelined local batching uses a SIMD lane as a stage of the

pipeline to create a batch. Pipelined local batching starts by reading a constraint

of the group from the input buffer at the first stage by SIMD lane 0. The lane

checks whether it can be inserted in batch 0. If the constraint is independent

from constraints in batch 0, batch index 0 is assigned to the constraint and the

constraint is deleted from the pipeline; otherwise, the constraint is forwarded to

the next stage of the pipeline, which is processed by SIMD lane 1. This is one

cycle of the pipeline. During the first cycle, only SIMD lane 0 is active. On

the next cycle, SIMD lane 0 reads the next constraint from the input buffer and

other lanes receive constraints from the previous stage of the pipeline. If the

first constraint is forwarded from SIMD lane 0 after the first cycle, SIMD lane 1

receives the constraint. Then each lane checks the dependency of the constraint

to the batch. If it is independent, it sets the batch index to the constraint and

the data is deleted from the pipeline; otherwise, it is delivered to the next stage

of the pipeline. As the number of cycles increases, more constraints flow on the

pipeline and SIMD lanes at a deeper pipeline stage are filled with data.

While serial batching starts creation of the second batch once the first batch

is created, pipelined local batching finishes batching soon after the first batch is

created. When the last constraint of the group is processed by lane 0, most of

the batches are completed and the pipeline finishes working once all the data in

the pipeline is processed.

Figure 4.4 illustrates pipelined local batching, and Algorithm 4.2 shows pseudo

code. Local data store (LDS) is used to forward data between stages processed

by each lane of a SIMD. The SIMD width of the GPU used for this chapter is 64;

therefore, it can create up to 64 batches. If a constraint could not be inserted in

the last lane, it overflows the pipeline. Overflowed constraints can be stored in

a buffer and processed after all the constraints are processed once. However, we

did not implement this because we have not encountered an overflow for our test

cases.

1: nRemainings ← pairs.getSize() // number of pairs not batched yet

2: while nRemainings > 0 do

3: iPair ← fetchFromBuffer()

4: if !locked(iPair.x) and !locked(iPair.y) then

5: batch.add( iPair )

6: lock( iPair.x )

7: lock( iPair.y )

8: else

9: forwardPairToNextLane( iPair )

10: end if

11: nRemainings = countRemainingPairs()

12: end while

Algorithm 4.2. Pipelined batching.

Search WWH ::

Custom Search

Home