Biomedical Engineering Reference
In-Depth Information
Fig. 6. Algorithm performance for problem sizes ranging from 64 to 8192 scattering bodies, in-
cluding the model test case of 1888, with error bars showing standard deviation. All SAXS com-
putations involve the summations over 51 scattering momenta. The asymptotic behavior indicates
hardware saturation at around 2000 bodies, which is the theoretical maximum for the GPU model
used in the tests.
calculation can be run at the same time. This is especially relevant in the case of problem
sizes that would not lead to a full GPU saturation, therefore allowing for multi-threaded
Monte Carlo simulations.
3.3
Monte Carlo Performance
The effect of the GPGPU SAXS curve calculation on the overall MCMC simulation
of the 1888-body test protein was measured in PHAISTOS, by gathering performance
data for 1 million MCMC steps with varying number of concurrent CPU threads, with
and without GPU acceleration. The first 100,000 steps were discarded as burn-in and
the remaining 900,000 were used for analysis.
The best-performing CPU SAXS algorithm was used in the evaluation. It caches the
terms of the Debye formula and uses a lookup table for the sine function, resulting in a
five-fold speed increase, at the expense of numerical precision. The MCMC simulation
was executed with one to six threads, the maximum possible under the test configu-
ration, due to the significant memory footprint per thread. The lowest execution time
was achieved when using four threads, and was used as a basis for comparison with the
GPGPU version.
The MCMC simulation, using the GPGPU Page-Tile algorithm, was executed with
one to eight CPU threads. Execution time was compared to the multi-threaded CPU
version (Fig. 7).
The GPU-accelerated MCMC simulation exhibits consistently better performance
compared to the best-performing multi-threaded CPU version. The speed increase scales
Search WWH ::




Custom Search