Fundamentals of Quantitative Design and Analysis - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

2. Task-Level Parallelism (TLP) arises because tasks of work are created that can operate inde-

pendently and largely in parallel.

Computer hardware in turn can exploit these two kinds of application parallelism in four

major ways:

1. Instruction-Level Parallelism exploits data-level parallelism at modest levels with compiler

help using ideas like pipelining and at medium levels using ideas like speculative execu-

tion.

2. Vector Architectures and Graphic Processor Units (GPUs) exploit data-level parallelism by ap-

plying a single instruction to a collection of data in parallel.

3. Thread-Level Parallelism exploits either data-level parallelism or task-level parallelism in a

tightly coupled hardware model that allows for interaction among parallel threads.

4. Request-Level Parallelism exploits parallelism among largely decoupled tasks specified by

the programmer or the operating system.

These four ways for hardware to support the data-level parallelism and task-level parallel-

ism go back 50 years. When Michael Flynn [1966] studied the parallel computing efforts in the

is he found a simple classification whose abbreviations we still use today. He looked at

the parallelism in the instruction and data streams called for by the instructions at the most

constrained component of the multiprocessor, and placed all computers into one of four cat-

egories:

1. Single instruction stream, single data stream (SISD)—This category is the uniprocessor. The

programmer thinks of it as the standard sequential computer, but it can exploit instruction-

level parallelism. Chapter 3 covers SISD architectures that use ILP techniques such as su-

perscalar and speculative execution.

2. Single instruction stream, multiple data streams (SIMD)—The same instruction is executed by

multiple processors using different data streams. SIMD computers exploit data-level paral-

lelism by applying the same operations to multiple items of data in parallel. Each processor

has its own data memory (hence the MD of SIMD), but there is a single instruction memory

and control processor, which fetches and dispatches instructions. Chapter 4 covers DLP

and three different architectures that exploit it: vector architectures, multimedia extensions

to standard instruction sets, and GPUs.

3. Multiple instruction streams, single data stream (MISD)—No commercial multiprocessor of

this type has been built to date, but it rounds out this simple classification

4. Multiple instruction streams, multiple data streams (MIMD)—Each processor fetches its own

instructions and operates on its own data, and it targets task-level parallelism. In general,

MIMD is more flexible than SIMD and thus more generally applicable, but it is inherently

more expensive than SIMD. For example, MIMD computers can also exploit data-level par-

allelism, although the overhead is likely to be higher than would be seen in an SIMD com-

puter. This overhead means that grain size must be sufficiently large to exploit the par-

allelism efficiently. Chapter 5 covers tightly coupled MIMD architectures, which exploit

thread-level parallelism since multiple cooperating threads operate in parallel. Chapter 6

covers loosely coupled MIMD architectures—specifically, clusters and warehouse-scale com-

puters —that exploit request-level parallelism , where many independent tasks can proceed in

parallel naturally with litle need for communication or synchronization.

This taxonomy is a coarse model, as many parallel processors are hybrids of the SISD, SIMD,

and MIMD classes. Nonetheless, it is useful to put a framework on the design space for the

computers we will see in this topic.

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home