Graphics Reference
In-Depth Information
PrefixScan ( counts , offsets );
// Reorder contacts using sortData
execute ( ReorderContactKernel , contacts , contactsSorted , sortData )
;
SetSortDataKernel computes the cell index and write a key value pair for each
contact as follows.
__kernel
void SetSortDataKernel ( __global Contact ￿ gContact ,
__global Body ￿ gBodies ,
__global int2 ￿ gSortDataOut )
{
int gIdx = get_global_id (0);
int aIdx = gContact [ gIdx ]. m_bodyA ;
int bIdx = gContact [ gIdx ]. m_bodyB ;
float4 p = gBodies [ aIdx ]. m_pos ;
int xIdx = convertToCellIdx ( p . x );
int zIdx = convertToCellIdx ( p . z );
gSortDataOut [ gIdx ]. x = computeUniqueId ( xIdx , zIdx );
gSortDataOut [ gIdx ]. y = gIdx ;
}
Once the key value pairs are sorted, contacts are reordered in ReorderContactKernel .
__kernel
void ReorderContactKernel ( __global Contact ￿ in ,
__global Contact4 ￿ out ,
__global int2 ￿ sortData )
{
int gIdx = get_global_id (0);
int srcIdx = sortData [ gIdx ]. y ;
out [ gIdx ]= in [ srcIdx ];
}
4.4.2 Pipelined Local Batching
Each constraint group can be processed in parallel; thus, it is assigned to a
SIMD of a GPU in a single kernel dispatch. However, the batching algorithm
described in Section 4.2.2 is a completely serial process, which is inecient if
it is executed on a GPU. The proposed pipelined local batching transforms the
serial batching algorithm into a pipelined parallel algorithm. Pipelined local
batching decomposes the while loop of Algorithm 4.1 and processes them in
parallel. However, each iteration of the while loop is dependent on the previous
iteration and it is not straightforward to parallelize.
Search WWH ::




Custom Search