Graphics Reference
In-Depth Information
4.3.3 Constraint Solver
Because the global split creates four dependent sets of constraint groups, as dis-
cussed in Section 4.3.1, four sequential solves are executed. Within a set of
constraint groups, constraint groups can be solved in parallel because they are
independent. A constraint group is solved by a SIMD. The SIMD sequentially
reads constraints of the group from the beginning of the section of the constraint
buffer using a SIMD lane to read a constraint. Once constraints are read by all
lanes, it compares batch indices of the constraints. If constraints belong to dif-
ferent batches, they have a dependency and are not able to be solved in parallel.
Thus, it selects a batch index to solve and if the batch index of a constraint is
different from the batch index, the SIMD lane is masked and the constraint is not
solved. After solving constraints in SIMD, it counts the number of constraints
solved at this step. Then it adds the count to the offset of the constraint buffer
and repeats the operations until all constraints in the group are processed.
4.4 GPU Implementation
The two-level constraint solver has challenges when it comes to implementation on
a GPU. In particular, batching described in Section 4.2.2 cannot be parallelized
easily. Thus, we propose pipelined local batching that extracts parallelism on
the batching by pipelining the algorithm to utilize wide SIMD architecture while
keeping the quality of batches high.
4.4.1 Global Split
Because we use a two-dimensional grid for the split, the belonging cell of a con-
straint can be found from the world position of the constraint. As a unique cell
index is assigned for each cell, cell indices are calculated for all constraints. Once
that is done, constraints are sorted by cell index [Merrill and Grimshaw 11]. After
the sort, bounds of constraints for cells are searched by comparing cell indices of
adjacent constraints. We must ensure that the size of the cell is larger than the
largest extent of rigid bodies; otherwise, non-adjacent cells can have a dependency
because they share the same rigid body.
Host code can be written as follows. It uses parallel primitives such as radix
sort and prefix scan. An OpenCL implementation can be found (for example,
[Harada and Howes 12] and [Coumans 13]).
// Compute sortData for each contact
execute ( SetSortDataKernel , contacts , bodies , sortData );
// Sort sortData . Keys are cell index
RadixSort ( sortData );
// Count number of entries per cell
BoundSearch ( sortData , counts );
//
Convert counts to offsets
Search WWH ::




Custom Search