Biomedical Engineering Reference
In-Depth Information
3
Results and Discussion
3.1
Computational Efficiency of the SAXS Modeling
The Debye formula (Equation 1) leads to a computational complexity of O ( M 2 ) , with
M the number of scatterers in the structure under examination. Our coarse-grained
approach reduces M by representing several atoms by one scattering body (a dummy
atom), thereby lowering the complexity to O ( M / k ) 2 , with k the number of scatterers
(atoms) described by a dummy body.
The precise value of k is dependent on the primary sequence of the protein. On large
datasets, the two dummy model leads to an average k of 4.24 (with a performance
increase of k 2
7 . 8 , allowing for a k 2
18 ). The single body model leads to k
60
times faster execution.
3.2
GPGPU Implementation
The performance of the Page-Tile algorithm was measured against a test protein of over
a thousand amino acids, modeled with 1888 scattering bodies in the dual dummy atom
representation, and a discretization of the q space in 51 scattering momenta. Protein
moves were modeled by a random mutation of 40% of the particles, to approximate the
asymptotic move rate in a Monte Carlo simulation. The execution times for the model
test case are presented in Table 1.
Ta b l e 1 . Execution times for SAXS curve calculation for a protein with 1888 bodies, 51 scattering
momenta and 21 form factors per momentum. Execution times from the top are for a single-
core CPU implementation, a parallel GPGPU full computation, and GPGPU partial computation,
respectively. Partial computations mimic the costs in a Monte Carlo simulation, where at each
step around 40% of the proposal structure is updated.
Algorithm
Time (ms)
CPU SP Time
2408
GPGPU full calculation
9
GPGPU recalculation
6.484
The performance of the algorithm was also measured for protein sizes ranging from
64 to 8192 scattering particles. Each protein was moved 1000 times, in order to obtain
an average of the recalculation steps. Figure 6 shows the speed increase, relative to the
CPU single precision implementation, calculated as t cpu /t gpu .
Figure 6 also illustrates the hardware utilization of the parallel Page-Tile algorithm.
The plot shows an asymptotic behavior around problem sizes of 2000 scattering bodies.
The GTX 560 Ti GPU employed in the tests is composed of 8 compute units operating
on 8 cascading work-groups, allowing for a theoretical peak of 64 active work-groups.
The work-group size is 32, therefore the card would reach theoretical peak processing
power at 2048 bodies. Our tests show saturation at the same level, thus indicating opti-
mal use of the hardware.
OpenCL is thread-safe and allows access to the same device from multiple processes
and threads, so by creating multiple instances of the Page-Tile algorithm, more than one
 
Search WWH ::




Custom Search