Two-Level Constraint Solver and Pipelined Local Batching for Rigid Body Simulation on GPUs - GPU Pro: Advanced Rendering Techniques - page 457

Graphics Reference

In-Depth Information

PrefixScan ( counts , offsets );

// Reorder contacts using sortData

execute ( ReorderContactKernel , contacts , contactsSorted , sortData ) ←

;

SetSortDataKernel computes the cell index and write a key value pair for each

contact as follows.

__kernel

void SetSortDataKernel ( __global Contact gContact ,

__global Body gBodies ,

__global int2 gSortDataOut )

{

int gIdx = get_global_id (0);

int aIdx = gContact [ gIdx ]. m_bodyA ;

int bIdx = gContact [ gIdx ]. m_bodyB ;

float4 p = gBodies [ aIdx ]. m_pos ;

int xIdx = convertToCellIdx ( p . x );

int zIdx = convertToCellIdx ( p . z );

gSortDataOut [ gIdx ]. x = computeUniqueId ( xIdx , zIdx );

gSortDataOut [ gIdx ]. y = gIdx ;

}

Once the key value pairs are sorted, contacts are reordered in ReorderContactKernel .

__kernel

void ReorderContactKernel ( __global Contact in ,

__global Contact4 out ,

__global int2 sortData )

{

int gIdx = get_global_id (0);

int srcIdx = sortData [ gIdx ]. y ;

out [ gIdx ]= in [ srcIdx ];

}

4.4.2 Pipelined Local Batching

Each constraint group can be processed in parallel; thus, it is assigned to a

SIMD of a GPU in a single kernel dispatch. However, the batching algorithm

described in Section 4.2.2 is a completely serial process, which is inecient if

it is executed on a GPU. The proposed pipelined local batching transforms the

serial batching algorithm into a pipelined parallel algorithm. Pipelined local

batching decomposes the while loop of Algorithm 4.1 and processes them in

parallel. However, each iteration of the while loop is dependent on the previous

iteration and it is not straightforward to parallelize.

Next Page

GPU Pro: Advanced Rendering Techniques

Search WWH ::

Custom Search

Home