Modern Graphics Hardware - Computer Graphics: Principles and Practice

Graphics Reference

In-Depth Information

subject of this section. We defer the discussion of data -related performance to

Section 38.6.

We are all familiar with doing more than one thing at the same time. For exam-

ple, you may think about computer graphics architecture while driving your car or

brushing your teeth. In computing we say that things are done in parallel if they are

in process at the same time. In this chapter, we'll further distinguish between true

and virtual parallelism. True parallelism employs separate mechanisms in physi-

cally concurrent operation. Virtual parallelism employs a single mechanism that

is switched rapidly from one sequential task to another, creating the appearance

of concurrent operation.

Most parallelism that we are aware of when using a computer is virtual. For

example, the motion of a cursor and of the scroll bar it is dragging appear con-

current, but each is computed separately on a single processing unit. By allowing

computing resources to be shared, and indeed by allocating resources in propor-

tion to requirements, virtual parallelism facilitates efficiency in computing. But

it cannot scale performance beyond the peak that can be delivered by a single

computing element.

All computing hardware therefore employs true parallelism to increase com-

puting performance. For example, even a so-called scalar processor, which exe-

cutes a single instruction at a time, is in fact highly parallel in its hardware

implementation, employing separate specialized circuits concurrently for address

translation, instruction decoding, arithmetic operation, program-counter advance-

ment, and many other operations. At an even finer level of detail, both address

translation and arithmetic operations utilize binary-addition circuits that employ

per-bit full adders and “fast-carry” networks that all operate in parallel, allow-

ing the result to be computed in the period of a single instruction. In a modern,

high-performance integrated circuit the longest sequential path typically employs

no more than 20 transistors, yet billions of transistors are employed overall. The

circuit must be massively parallel.

Because true hardware parallelism is an implementation artifact, it cannot be

specified architecturally. Instead, an architecture specifies parallelism that can be

implemented either virtually (by sharing hardware circuits) or truly (with separate

hardware circuits), or, typically, through a combination of the two. To better under-

stand these alternatives, and to illustrate that parallelism has long been central to

computing performance, we will briefly consider the architecture and implemen-

tation of a decades-old system: the CRAY-1 supercomputer.

Figure 38.5: The CRAY-1 super-

computer was the fastest system

available when it was introduced

in 1976. (Courtesy of Clemens

Pfeiffer. Original image at http://

upload.wikimedia.org/wikipedia/

commons/f/f7/Cray-1-deutsches-

museum.jpg. )

The CRAY-1 was developed by Cray Research, Inc., primarily to satisfy the

computing needs of the U.S. Department of Defense (see Figure 38.5). When

introduced in 1976, it was the fastest scalar processor in the world: Its cycle

time of 12.5 ns supported the execution of 80 million instructions per second. 4

Key to its peak performance of 250 million floating-point operations per second

(MFLOPS), however, were special instructions that specified arithmetic opera-

tions on vectors of up to 64 operands. Data vectors were gathered from mem-

ory, stored in 64-location vector registers, and operated on by arithmetic vector

instructions (e.g., the vector sum or vector per-element product of two vector

operands could be computed), and the results returned to main memory.

4. The analysis in this discussion is slightly simplified, because even in scalar operation

the CRAY-1 could in some cases execute two instructions per cycle.

Computer Graphics: Principles and Practice

Search WWH ::

Custom Search

Home