Hardware Reference
In-Depth Information
Relatively small size of GPU memory . A commonsense use of faster computation is to solve
bigger problems, and bigger problems often have a larger memory footprint. This GPU in-
consistency between speed and size can be addressed with more memory capacity. The
challenge is to maintain high bandwidth while increasing capacity.
Direct I/O to GPU memory . Real programs do I/O to storage devices as well as to frame buf-
fers, and large programs can require a lot of I/O as well as a sizeable memory. Today's GPU
systems must transfer between I/O devices and system memory and then between system
memory and GPU memory. This extra hop significantly lowers I/O performance in some
programs, making GPUs less atractive. Amdahl's law warns us what happens when you
neglect one piece of the task while accelerating others. We expect that future GPUs will
make all I/O first-class citizens, just as it does for frame buffer I/O today.
Unified physical memories . An alternative solution to the prior two bullets is to have a single
physical memory for the system and GPU, just as some inexpensive GPUs do for PMDs
and laptops. The AMD Fusion architecture, announced just as this edition was being in-
ished, is an initial merger between traditional GPUs and traditional CPUs. NVIDIA also
announced Project Denver, which combines an ARM scalar processor with NVIDIA GPUs
in a single address space. When these systems are shipped, it will be interesting to learn
just how tightly integrated they are and the impact of integration on performance and en-
ergy of both data parallel and graphics applications.
Having covered the many versions of SIMD, the next chapter dives into the realm of MIMD.
4.10 Historical Perspective and References
Section L.6 (available online) features a discussion on the Illiac IV (a representative of the early
SIMD architectures) and the Cray-1 (a representative of vector architectures). We also look at
multimedia SIMD extensions and the history of GPUs.
Case Study and Exercises by Jason D. Bakos
Case Study: Implementing A Vector Kernel On A Vector
Processor And GPU
Concepts illustrated by this case study
■ Programming Vector Processors
■ Programming GPUs
■ Performance Estimation
MrBayes is a popular and well-known computational biology application for inferring the
evolutionary histories among a set of input species based on their multiply-aligned DNA se-
quence data of length n . MrBayes works by performing a heuristic search over the space of
all binary tree topologies for which the inputs are the leaves. In order to evaluate a particular
tree, the application must compute an n × 4 conditional likelihood table (named clP) for each
interior node. The table is a function of the conditional likelihood tables of the node's two des-
cendent nodes ( clL and clR , single precision floating point) and their associated n × 4 × 4 trans-
ition probability tables ( tiPL and tiPR , single precision floating point). One of this application's
kernels is the computation of this conditional likelihood table and is shown below:
for (k=0; k<seq_length; k++) {
clP[h++] = (tiPL[AA]*clL[A] + tiPL[AC]*clL[C] + tiPL[AG]*clL[G] + tiPL[AT]*clL[T])
 
 
Search WWH ::




Custom Search