Hardware Reference
In-Depth Information
4.1 Introduction
A question for the single instruction, multiple data (SIMD) architecture, which Chapter 1 in-
troduced, has always been just how wide a set of applications has significant data-level paral-
lelism (DLP). Fifty years later, the answer is not only the matrix-oriented computations of sci-
entific computing, but also the media-oriented image and sound processing. Moreover, since
a single instruction can launch many data operations, SIMD is potentially more energy ei-
cient than multiple instruction multiple data (MIMD), which needs to fetch and execute one
instruction per data operation. These two answers make SIMD atractive for Personal Mobile
Devices. Finally, perhaps the biggest advantage of SIMD versus MIMD is that the program-
mer continues to think sequentially yet achieves parallel speedup by having parallel data op-
erations.
This chapter covers three variations of SIMD: vector architectures, multimedia SIMD in-
struction set extensions, and graphics processing units (GPUs). 1
The first variation, which predates the other two by more than 30 years, means essentially
pipelined execution of many data operations. These vector architectures are easier to under-
stand and to compile to than other SIMD variations, but they were considered too expensive
for microprocessors until recently. Part of that expense was in transistors and part was in the
cost of sufficient DRAM bandwidth, given the widespread reliance on caches to meet memory
performance demands on conventional microprocessors.
The second SIMD variation borrows the SIMD name to mean basically simultaneous paral-
lel data operations and is found in most instruction set architectures today that support mul-
timedia applications. For x86 architectures, the SIMD instruction extensions started with the
MMX (Multimedia Extensions) in 1996, which were followed by several SSE (Streaming SIMD
Extensions) versions in the next decade, and they continue to this day with AVX (Advanced
Vector Extensions). To get the highest computation rate from an x86 computer, you often need
to use these SIMD instructions, especially for floating-point programs.
The third variation on SIMD comes from the GPU community, offering higher potential per-
formance than is found in traditional multicore computers today. While GPUs share features
with vector architectures, they have their own distinguishing characteristics, in part due to
the ecosystem in which they evolved. This environment has a system processor and system
memory in addition to the GPU and its graphics memory. In fact, to recognize those distinc-
tions, the GPU community refers to this type of architecture as heterogeneous .
For problems with lots of data parallelism, all three SIMD variations share the advantage
of being easier for the programmer than classic parallel MIMD programming. To put into
perspective the importance of SIMD versus MIMD, Figure 4.1 plots the number of cores for
MIMD versus the number of 32-bit and 64-bit operations per clock cycle in SIMD mode for x86
computers over time.
 
 
Search WWH ::




Custom Search