Data-Level Parallelism in Vector, SIMD, and GPU Architectures - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

4.1 Introduction

A question for the single instruction, multiple data (SIMD) architecture, which Chapter 1 in-

troduced, has always been just how wide a set of applications has significant data-level paral-

lelism (DLP). Fifty years later, the answer is not only the matrix-oriented computations of sci-

entific computing, but also the media-oriented image and sound processing. Moreover, since

a single instruction can launch many data operations, SIMD is potentially more energy ei-

cient than multiple instruction multiple data (MIMD), which needs to fetch and execute one

instruction per data operation. These two answers make SIMD atractive for Personal Mobile

Devices. Finally, perhaps the biggest advantage of SIMD versus MIMD is that the program-

mer continues to think sequentially yet achieves parallel speedup by having parallel data op-

erations.

This chapter covers three variations of SIMD: vector architectures, multimedia SIMD in-

struction set extensions, and graphics processing units (GPUs). 1

The first variation, which predates the other two by more than 30 years, means essentially

pipelined execution of many data operations. These vector architectures are easier to under-

stand and to compile to than other SIMD variations, but they were considered too expensive

for microprocessors until recently. Part of that expense was in transistors and part was in the

cost of sufficient DRAM bandwidth, given the widespread reliance on caches to meet memory

performance demands on conventional microprocessors.

The second SIMD variation borrows the SIMD name to mean basically simultaneous paral-

lel data operations and is found in most instruction set architectures today that support mul-

timedia applications. For x86 architectures, the SIMD instruction extensions started with the

MMX (Multimedia Extensions) in 1996, which were followed by several SSE (Streaming SIMD

Extensions) versions in the next decade, and they continue to this day with AVX (Advanced

Vector Extensions). To get the highest computation rate from an x86 computer, you often need

to use these SIMD instructions, especially for floating-point programs.

The third variation on SIMD comes from the GPU community, offering higher potential per-

formance than is found in traditional multicore computers today. While GPUs share features

with vector architectures, they have their own distinguishing characteristics, in part due to

the ecosystem in which they evolved. This environment has a system processor and system

memory in addition to the GPU and its graphics memory. In fact, to recognize those distinc-

tions, the GPU community refers to this type of architecture as heterogeneous .

For problems with lots of data parallelism, all three SIMD variations share the advantage

of being easier for the programmer than classic parallel MIMD programming. To put into

perspective the importance of SIMD versus MIMD, Figure 4.1 plots the number of cores for

MIMD versus the number of 32-bit and 64-bit operations per clock cycle in SIMD mode for x86

computers over time.

Search WWH ::

Custom Search

Home