Graphics Reference
In-Depth Information
subject of this section. We defer the discussion of data -related performance to
Section 38.6.
We are all familiar with doing more than one thing at the same time. For exam-
ple, you may think about computer graphics architecture while driving your car or
brushing your teeth. In computing we say that things are done in parallel if they are
in process at the same time. In this chapter, we'll further distinguish between true
and virtual parallelism. True parallelism employs separate mechanisms in physi-
cally concurrent operation. Virtual parallelism employs a single mechanism that
is switched rapidly from one sequential task to another, creating the appearance
of concurrent operation.
Most parallelism that we are aware of when using a computer is virtual. For
example, the motion of a cursor and of the scroll bar it is dragging appear con-
current, but each is computed separately on a single processing unit. By allowing
computing resources to be shared, and indeed by allocating resources in propor-
tion to requirements, virtual parallelism facilitates efficiency in computing. But
it cannot scale performance beyond the peak that can be delivered by a single
computing element.
All computing hardware therefore employs true parallelism to increase com-
puting performance. For example, even a so-called scalar processor, which exe-
cutes a single instruction at a time, is in fact highly parallel in its hardware
implementation, employing separate specialized circuits concurrently for address
translation, instruction decoding, arithmetic operation, program-counter advance-
ment, and many other operations. At an even finer level of detail, both address
translation and arithmetic operations utilize binary-addition circuits that employ
per-bit full adders and “fast-carry” networks that all operate in parallel, allow-
ing the result to be computed in the period of a single instruction. In a modern,
high-performance integrated circuit the longest sequential path typically employs
no more than 20 transistors, yet billions of transistors are employed overall. The
circuit must be massively parallel.
Because true hardware parallelism is an implementation artifact, it cannot be
specified architecturally. Instead, an architecture specifies parallelism that can be
implemented either virtually (by sharing hardware circuits) or truly (with separate
hardware circuits), or, typically, through a combination of the two. To better under-
stand these alternatives, and to illustrate that parallelism has long been central to
computing performance, we will briefly consider the architecture and implemen-
tation of a decades-old system: the CRAY-1 supercomputer.
Figure 38.5: The CRAY-1 super-
computer was the fastest system
available when it was introduced
in 1976. (Courtesy of Clemens
Pfeiffer. Original image at http://
upload.wikimedia.org/wikipedia/
commons/f/f7/Cray-1-deutsches-
museum.jpg. )
The CRAY-1 was developed by Cray Research, Inc., primarily to satisfy the
computing needs of the U.S. Department of Defense (see Figure 38.5). When
introduced in 1976, it was the fastest scalar processor in the world: Its cycle
time of 12.5 ns supported the execution of 80 million instructions per second. 4
Key to its peak performance of 250 million floating-point operations per second
(MFLOPS), however, were special instructions that specified arithmetic opera-
tions on vectors of up to 64 operands. Data vectors were gathered from mem-
ory, stored in 64-location vector registers, and operated on by arithmetic vector
instructions (e.g., the vector sum or vector per-element product of two vector
operands could be computed), and the results returned to main memory.
4. The analysis in this discussion is slightly simplified, because even in scalar operation
the CRAY-1 could in some cases execute two instructions per cycle.
Search WWH ::




Custom Search