Biomedical Engineering Reference
In-Depth Information
Algorithm 1. Page-Tile SAXS algorithm
Input : scattering momenta, form factors table, scattering particles
Output : intensity curves for the scattering momenta
/*Host program*/
Initialize the parallel algorithm
Transfer input data to GPU global memory and queue the kernels for initial profile calculation
/*Kernels executed on the GPU*/
Map form factors to scattering particles (Kernel 1)
Compute the Debye sum term for each tile (Kernel 2)
Perform vertical tile sum reduction for each page (Kernel 3)
Perform horizontal margin sum reduction for each page to get the intensity curve (Kernel 4)
/*Host program*/
Retrieve the results from GPU global memory
have a type in the form of an index into the form factor table. The initial intensity curve
calculation comprises four kernels.
In Kernel 1 the form factor table is mapped into a form suited for hardware-efficient
parallel access. The form factors are organized by scattering particle, which enables
the work-groups with streaming memory access for both center coordinates and form
factors.
The majority of execution time is spent in Kernel 2, where the Debye sum terms
for the individual tiles are computed. The Debye formula is used for each term, but i
and j are limited to the ranges defined by the boundaries of the tile within the global
index space. The kernel uses local memory to improve performance, by pre-loading
the particles and their form factors, and by performing an in-place parallel reduction
to produce the partial sum for the tile. During the Debye calculation, 4x loop unrolling
utilizes local registers to further optimize this stage.
Kernel 3 reduces the tile partial sums, which are stored in a global cache, to bottom
margin sums that are further reduced by Kernel 4 to yield the final intensity curve.
Tile Recalculation. Markov chain Monte Carlo simulations explore the conforma-
tional space of the protein structures by applying partial modifications to an accepted
proposal. The average SAXS computation is therefore a partial re-evaluation of a pre-
viously computed profile, where only a subset of the bodies changed their position.
It is therefore possible to identify the subset of tiles that needs to be updated. Since
the Page-Tile algorithm caches the partial contribution of each tile to the global sum-
mation, we can impose a partial recalculation of only the affected tiles (see Fig. 5). This
leads to a substantial reduction in the time necessary to derive an intensity curve after a
Monte Carlo transition (move).
Algorithm 2 illustrates the pseudocode for tile recalculation. The form factor table
from the initial calculation is reused, so execution starts directly with Kernel 5, which
identifies the affected tiles and invokes Kernel 2 for them. Kernel 3 and Kernel 4 per-
form the reductions as in the initial calculation.
Floating Point Precision. Floating point numbers can be stored and manipulated with
single precision (SP) or double precision (DP). Mathematical operations on floating
 
Search WWH ::




Custom Search