Graphics Reference
In-Depth Information
The proposed pipelined local batching uses a SIMD lane as a stage of the
pipeline to create a batch. Pipelined local batching starts by reading a constraint
of the group from the input buffer at the first stage by SIMD lane 0. The lane
checks whether it can be inserted in batch 0. If the constraint is independent
from constraints in batch 0, batch index 0 is assigned to the constraint and the
constraint is deleted from the pipeline; otherwise, the constraint is forwarded to
the next stage of the pipeline, which is processed by SIMD lane 1. This is one
cycle of the pipeline. During the first cycle, only SIMD lane 0 is active. On
the next cycle, SIMD lane 0 reads the next constraint from the input buffer and
other lanes receive constraints from the previous stage of the pipeline. If the
first constraint is forwarded from SIMD lane 0 after the first cycle, SIMD lane 1
receives the constraint. Then each lane checks the dependency of the constraint
to the batch. If it is independent, it sets the batch index to the constraint and
the data is deleted from the pipeline; otherwise, it is delivered to the next stage
of the pipeline. As the number of cycles increases, more constraints flow on the
pipeline and SIMD lanes at a deeper pipeline stage are filled with data.
While serial batching starts creation of the second batch once the first batch
is created, pipelined local batching finishes batching soon after the first batch is
created. When the last constraint of the group is processed by lane 0, most of
the batches are completed and the pipeline finishes working once all the data in
the pipeline is processed.
Figure 4.4 illustrates pipelined local batching, and Algorithm 4.2 shows pseudo
code. Local data store (LDS) is used to forward data between stages processed
by each lane of a SIMD. The SIMD width of the GPU used for this chapter is 64;
therefore, it can create up to 64 batches. If a constraint could not be inserted in
the last lane, it overflows the pipeline. Overflowed constraints can be stored in
a buffer and processed after all the constraints are processed once. However, we
did not implement this because we have not encountered an overflow for our test
cases.
1: nRemainings pairs.getSize() // number of pairs not batched yet
2: while nRemainings > 0 do
3: iPair fetchFromBuffer()
4: if !locked(iPair.x) and !locked(iPair.y) then
5: batch.add( iPair )
6: lock( iPair.x )
7: lock( iPair.y )
8: else
9: forwardPairToNextLane( iPair )
10: end if
11: nRemainings = countRemainingPairs()
12: end while
Algorithm 4.2. Pipelined batching.
Search WWH ::




Custom Search