Modern Graphics Hardware - Computer Graphics: Principles and Practice - page 1111

Graphics Reference

In-Depth Information

Application

GPU chip

PCIe interface

Vertex generation

Primitive generation

Fragment generation

Core

Instruction

unit

Work queuing and distribution

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

DP

ALU

DP

ALU

DP

ALU

DP

ALU

DP

ALU

DP

ALU

DP

ALU

DP

ALU

SFU

SFU

Local

memory

TU

L1$

TU

L1$

TU

L1$

TU

L1$

TU

L1$

TU

L1$

TU

L1$

TU

L1$

DP: data path

ALU: fp add and mult

SFU: special functions

TU: texture unit

L1$: 1st level cache

L2$: 2nd level cache

Interconnection network

Pixel ops

L2$

Pixel ops

L2$

Pixel ops

L2$

Pixel ops

L2$

GDDR3 memory

GDDR3 memory

GDDR3 memory

GDDR3 memory

Figure 38.4: NVIDIA GeForce 9800 GTX block diagram.

38.3.2 GPU Implementation

Now we turn our attention to an implementation of the graphics pipeline

architecture—NVIDIA's GeForce 9800 GTX. Figure 38.4 is a block-diagram

depiction of this implementation. Block names have been chosen to best illus-

trate the correspondence to the blocks on the pipeline block diagram shown in

Figure 38.3. There is a one-to-one correspondence between the vertex, primitive,

and fragment generation stages in the pipeline and the identically named blocks

in the implementation diagram. Conversely, all three programmable stages in the

pipeline correspond to the aggregation of 16 cores, eight texture units (TUs), and

the block titled “Work queuing and distribution.” This highly parallel, application-

programmable complex of computing cores and fixed-function hardware is central

to the GTX implementation—we'll have much more to say about it. Completing

the correspondence, the pixel operation stage of the pipeline corresponds to the

four pixel ops blocks in the implementation diagram, and the large memory block

adjacent to the pipeline corresponds to the aggregation of the eight L1$, four L2$,

and four GDDR3 memory blocks in the implementation. The PCIe interface and

interconnection network implementation blocks represent a significant mecha-

nism that has no counterpart in the architectural diagram. A few other significant

blocks, such as display refresh (regularly transferring pixels from the output image

to the display monitor) and memory controller logic (a complex and highly opti-

mized circuit), have been omitted from the implementation diagram.

38.4 Parallelism

As we have seen, several decades of exponential increase in transistor count and

performance have given computer system designers ample resources to design and

build high-performance systems. Fundamentally, computing involves performing

operations on data . Parallelism, the organization and orchestration of simulta-

neous operation, is the key to achieving high-performance operation .Itisthe

Next Page

Computer Graphics: Principles and Practice

Search WWH ::

Custom Search

Home