Graphics Reference
In-Depth Information
but constrains optimizations by the GPU driver (e.g., a compromise tiling of the
texture image may be chosen). 14
38.7.2 Cache Memory
We've discussed what data locality is, and seen some examples of how GPU archi-
tecture and implementation are designed to increase data locality. Now we'll see
how data locality can be exploited by system designers to greatly improve the
performance of computing systems.
Recall from Section 38.6.2 that latency and bandwidth, the two greatest con-
cerns of a memory-system designer, are inversely related to memory capacity:
Smaller memories have lower access latency and higher access bandwidth, that
is, they are faster than larger memories. Because locality of reference makes it
likely that, for a given short window of time, most memory accesses are to a small
number of physical memory blocks, we can improve performance by storing these
blocks in a smaller, faster memory. There are two ways to expose this local mem-
ory: explicitly and implicitly.
The explicit approach directly exposes local memory in the architecture, giv-
ing programmers complete control of its use. Address fields of explicit local
memory (which is small) require fewer bits than addresses for the main memory
system (which is large), so their use reduces the bit rate of the processor instruc-
tion stream, as well as benefitting from reduced latency and increased bandwidth
to the specified data. Registers are an extreme form of explicit local memory: They
require very small addresses, and they exhibit negligible latency and huge band-
width. The local memory blocks in Figure 38.3 are a more typical form of explicit
local memory. Because these memories are not visible in either the OpenGL or the
Direct3D pipeline models, we defer discussion to Section 38.9.
Although registers are explicit in the Direct3D architecture, their allocation
and use is managed by the shader compiler rather than by programmers. In prin-
ciple, shader compilers could also manage and optimize the use of other local
memory, but in practice this is left to human programmers. Programmer-specified
management of local memory is powerful, but it is also complex, time-consuming,
and error-prone. Thus, it is desirable to provide a form of local storage that is man-
aged implicitly and automatically by low-level (typically hardware-implemented)
mechanisms within the architecture. We refer to such local memory as cache (pro-
nounced “cash”) memory.
Cache memory intercepts all accesses to main memory. If the requested data
item is already present in the cache, it is either returned with low latency (in the
case of a read request) or modified in place (in the case of a write request). If
the requested item is not present in the cache, some previously cached item is
evicted, the requested item is read from main memory into the cache, and then
either returned with high latency (in the case of a read request) or modified in place
(in the case of a write request). All of this happens implicitly, without programmer
intervention, so there is no opportunity for programmer error. But programmers
can significantly influence performance by coding to maximize data locality, since
accesses of main memory are so costly.
14. OpenGL versions 3.3 and later have adopted the Direct3D approach of decoupling
texture images and interpolation modes.
 
 
Search WWH ::




Custom Search