Biomedical Engineering Reference
In-Depth Information
Fig. 7. Relative speedup of PHAISTOS when using the GPGPU SAXS energy term vs. the best-
performing CPU configuration (four threads). Threads above four use hyper-threaded CPU cores,
so lower performance scaling is expected.
up with the number of CPU threads, due to two factors: 1) incomplete loading of the
GPU by each thread; and 2) concurrent MCMC calculations outside of the SAXS en-
ergy term.
When invoked from within a CPU thread, the GPU calculates the SAXS intensity
profile and is then idle, while the host thread processes the result and queues a new
structure for evaluation. Multiple CPU threads can take advantage of these idle GPU
cycles by queueing additional SAXS curve calculations, thus leading to the performance
scaling observed.
While the GPU accelerates the calculation of the SAXS energy term, the CPU host
still has to propose a new structure in each MCMC step and to process the result from
the energy function calculation. The work done by the CPU limits the speed increase for
the overall step. Conversely, running multiple host threads parallelizes these operations,
further contributing to the performance scaling.
The theoretical possible speedup under PHAISTOS is 18.6
×
, and can be calculated
from the Page-Tile algorithm speedup (371
from Fig. 6) adjusted for the number of
CPU threads (4) and the speedup from using a cache and a sine lookup table (5
×
×
). The
observed speed increase of 16.4
approaches the theoretical maximum and is clearly
limited by the CPU-bound portions of the MCMC simulation.
×
4
Conclusions
We have presented an efficient implementation of the forward model for the compu-
tation of Small Angle X-ray Scattering profiles, utilizing Graphics Processing Units.
Search WWH ::




Custom Search