Hardware Reference
In-Depth Information
2. Task-Level Parallelism (TLP) arises because tasks of work are created that can operate inde-
pendently and largely in parallel.
Computer hardware in turn can exploit these two kinds of application parallelism in four
major ways:
1. Instruction-Level Parallelism exploits data-level parallelism at modest levels with compiler
help using ideas like pipelining and at medium levels using ideas like speculative execu-
tion.
2. Vector Architectures and Graphic Processor Units (GPUs) exploit data-level parallelism by ap-
plying a single instruction to a collection of data in parallel.
3. Thread-Level Parallelism exploits either data-level parallelism or task-level parallelism in a
tightly coupled hardware model that allows for interaction among parallel threads.
4. Request-Level Parallelism exploits parallelism among largely decoupled tasks specified by
the programmer or the operating system.
These four ways for hardware to support the data-level parallelism and task-level parallel-
ism go back 50 years. When Michael Flynn [1966] studied the parallel computing efforts in the
is he found a simple classification whose abbreviations we still use today. He looked at
the parallelism in the instruction and data streams called for by the instructions at the most
constrained component of the multiprocessor, and placed all computers into one of four cat-
egories:
1. Single instruction stream, single data stream (SISD)—This category is the uniprocessor. The
programmer thinks of it as the standard sequential computer, but it can exploit instruction-
level parallelism. Chapter 3 covers SISD architectures that use ILP techniques such as su-
perscalar and speculative execution.
2. Single instruction stream, multiple data streams (SIMD)—The same instruction is executed by
multiple processors using different data streams. SIMD computers exploit data-level paral-
lelism by applying the same operations to multiple items of data in parallel. Each processor
has its own data memory (hence the MD of SIMD), but there is a single instruction memory
and control processor, which fetches and dispatches instructions. Chapter 4 covers DLP
and three different architectures that exploit it: vector architectures, multimedia extensions
to standard instruction sets, and GPUs.
3. Multiple instruction streams, single data stream (MISD)—No commercial multiprocessor of
this type has been built to date, but it rounds out this simple classification
4. Multiple instruction streams, multiple data streams (MIMD)—Each processor fetches its own
instructions and operates on its own data, and it targets task-level parallelism. In general,
MIMD is more flexible than SIMD and thus more generally applicable, but it is inherently
more expensive than SIMD. For example, MIMD computers can also exploit data-level par-
allelism, although the overhead is likely to be higher than would be seen in an SIMD com-
puter. This overhead means that grain size must be sufficiently large to exploit the par-
allelism efficiently. Chapter 5 covers tightly coupled MIMD architectures, which exploit
thread-level parallelism since multiple cooperating threads operate in parallel. Chapter 6
covers loosely coupled MIMD architectures—specifically, clusters and warehouse-scale com-
puters —that exploit request-level parallelism , where many independent tasks can proceed in
parallel naturally with litle need for communication or synchronization.
This taxonomy is a coarse model, as many parallel processors are hybrids of the SISD, SIMD,
and MIMD classes. Nonetheless, it is useful to put a framework on the design space for the
computers we will see in this topic.
Search WWH ::




Custom Search