Biomedical Engineering Reference
In-Depth Information
Fig. 4. Domain decomposition for the Page-Tile algorithm. Work-groups operate on square
tiles from the matrix. Only tiles in the lower-left part and the diagonal are evaluated.
parallelization, into pages. Each page represents the computation of the intensity curve
I ( q ) for a single value of q . A page can be visualized as a square problem matrix of
side equal to the number of scattering particles M , with each cell representing the con-
tribution of a single term of the Debye formula for particular i and j .
For performance considerations and direct mapping to the hardware, pages are par-
titioned into square tiles of side k ,where k is set to the specific compute unit size
of the OpenCL device. Since each problem matrix is symmetrical, only the tiles en-
compassing the lower-left triangle and the diagonal are computed and their value is
simply duplicated for the mirror tiles in the upper-right triangle of the matrix. The do-
main decomposition is illustrated in Fig. 4 for an example of 16 scattering particles and
work-group size of 4.
GPUs suffer performance penalties when they have to work with data that is not
aligned to their native architecture. The algorithm therefore pads the data and aligns
it to the specified work-group size. The resultant dummy particles participate in the
Debye calculations, but they are assigned a form factor of 0, so their contribution to the
intensity I ( q ) is null.
Algorithm 1 presents the pseudocode for the Page-Tile SAXS algorithm. The form
factors table, supplied as input, is packed and organized by scattering momentum and
particle type. The scattering particles, in addition to their position in three dimensions,
 
Search WWH ::




Custom Search