Graphics Reference
In-Depth Information
PrefixScan
(
counts
,
offsets
);
// Reorder contacts using sortData
execute
(
ReorderContactKernel
,
contacts
,
contactsSorted
,
sortData
)
←
;
SetSortDataKernel computes the cell index and write a key value pair for each
contact as follows.
__kernel
void
SetSortDataKernel
(
__global Contact
gContact
,
__global Body
gBodies
,
__global int2
gSortDataOut
)
{
int
gIdx
=
get_global_id
(0);
int
aIdx
=
gContact
[
gIdx
].
m_bodyA
;
int
bIdx
=
gContact
[
gIdx
].
m_bodyB
;
float4
p
=
gBodies
[
aIdx
].
m_pos
;
int
xIdx
=
convertToCellIdx
(
p
.
x
);
int
zIdx
=
convertToCellIdx
(
p
.
z
);
gSortDataOut
[
gIdx
].
x
=
computeUniqueId
(
xIdx
,
zIdx
);
gSortDataOut
[
gIdx
].
y
=
gIdx
;
}
Once the key value pairs are sorted, contacts are reordered in
ReorderContactKernel
.
__kernel
void
ReorderContactKernel
(
__global Contact
in
,
__global Contact4
out
,
__global int2
sortData
)
{
int
gIdx
=
get_global_id
(0);
int
srcIdx
=
sortData
[
gIdx
].
y
;
out
[
gIdx
]=
in
[
srcIdx
];
}
4.4.2 Pipelined Local Batching
Each constraint group can be processed in parallel; thus, it is assigned to a
SIMD of a GPU in a single kernel dispatch. However, the batching algorithm
described in Section 4.2.2 is a completely serial process, which is inecient if
it is executed on a GPU. The proposed pipelined local batching transforms the
serial batching algorithm into a pipelined parallel algorithm. Pipelined local
batching decomposes the
while
loop of Algorithm 4.1 and processes them in
parallel. However, each iteration of the
while
loop is dependent on the previous
iteration and it is not straightforward to parallelize.