Two-Level Constraint Solver and Pipelined Local Batching for Rigid Body Simulation on GPUs - GPU Pro: Advanced Rendering Techniques - page 456

Graphics Reference

In-Depth Information

4.3.3 Constraint Solver

Because the global split creates four dependent sets of constraint groups, as dis-

cussed in Section 4.3.1, four sequential solves are executed. Within a set of

constraint groups, constraint groups can be solved in parallel because they are

independent. A constraint group is solved by a SIMD. The SIMD sequentially

reads constraints of the group from the beginning of the section of the constraint

buffer using a SIMD lane to read a constraint. Once constraints are read by all

lanes, it compares batch indices of the constraints. If constraints belong to dif-

ferent batches, they have a dependency and are not able to be solved in parallel.

Thus, it selects a batch index to solve and if the batch index of a constraint is

different from the batch index, the SIMD lane is masked and the constraint is not

solved. After solving constraints in SIMD, it counts the number of constraints

solved at this step. Then it adds the count to the offset of the constraint buffer

and repeats the operations until all constraints in the group are processed.

4.4 GPU Implementation

The two-level constraint solver has challenges when it comes to implementation on

a GPU. In particular, batching described in Section 4.2.2 cannot be parallelized

easily. Thus, we propose pipelined local batching that extracts parallelism on

the batching by pipelining the algorithm to utilize wide SIMD architecture while

keeping the quality of batches high.

4.4.1 Global Split

Because we use a two-dimensional grid for the split, the belonging cell of a con-

straint can be found from the world position of the constraint. As a unique cell

index is assigned for each cell, cell indices are calculated for all constraints. Once

that is done, constraints are sorted by cell index [Merrill and Grimshaw 11]. After

the sort, bounds of constraints for cells are searched by comparing cell indices of

adjacent constraints. We must ensure that the size of the cell is larger than the

largest extent of rigid bodies; otherwise, non-adjacent cells can have a dependency

because they share the same rigid body.

Host code can be written as follows. It uses parallel primitives such as radix

sort and prefix scan. An OpenCL implementation can be found (for example,

[Harada and Howes 12] and [Coumans 13]).

// Compute sortData for each contact

execute ( SetSortDataKernel , contacts , bodies , sortData );

// Sort sortData . Keys are cell index

RadixSort ( sortData );

// Count number of entries per cell

BoundSearch ( sortData , counts );

//

Convert counts to offsets

Next Page

GPU Pro: Advanced Rendering Techniques

Search WWH ::

Custom Search

Home