Game Development Reference
In-Depth Information
also potentially free up Shader and Texture Unit cycles along the way. Finally,
reducing the size of render targets, especially if they are intermediate targets, can
be helpful here as well.
Frame buffer. This is the interface into the memory subsystem, and all requests
for reads and writes go through the Frame Buffer unit. Because it is shared, many
of the suggestions mentioned above apply, but the biggest culprit of bottleneck is
typically related to pixel throughput. Since pixels are likely your highest number
of items, or threads, processed, reducing them is going to reduce Frame Buffer
overhead. Along with that, make sure you are using only the render target and
resource formats you have to use. Compressing textures whenever you can and
using floating-point formats sparingly are all good ways to reduce Frame Buffer
activity and free up this bottleneck.
21.5 Tracing Activity across Your CPU and GPU
Games are among the most demanding applications that a PC can run. They stress
all parts of the system, including the CPU, the GPU, and the system bus (PCI-
E). Typical profilers, including the Parallel Nsight Frame Profiler (Section 21.4),
focus on only a single API or subsystem, but offer no clues as to how well the
different APIs and subsystems work together. The Parallel Nsight Analyzer is a
post-mortem tool that addresses this need by collecting and visualizing data about
how the game utilizes the system across several subsystems and APIs.
Supported subsystems and APIs include the CPU (threads, processes, cores, and
custom events), graphics APIs (Direct3D, OpenGL), and GPGPU APIs (CUDA
C/C++, OpenCL, DirectCompute). In addition, while many profiling tools sig-
nificantly change the runtime characteristics of the game they are profiling, the
Analyzer's system trace is very low-overhead, and thus delivers an accurate pic-
ture of system activity, with only minimal timing changes due to the trace itself.
The Analyzer works on any CPU and GPU hardware, but no GPU Performance
Counters are available on non-NVIDIA hardware.
21.5.1 Configuring and Capturing a Trace
Configuring a trace is simple and requires only specifying the APIs and options
tobetraced(see Figure21.11 ) .Formostofthesubsystems,noinstrumentation,
either manual or automatic, is required, and you may run the trace on any program,
even programs that are binary-only or lack symbols. The one exception to this rule
is NVIDIA Tools Extension events, which are powerful custom events.
Each API/subsystem has several options where you can enable and disable the
tracing of specific event types. In particular, consider the expected frequency of
an event when choosing to trace it. First, high frequency events contribute to
overhead of the trace, as there are approximately two microseconds of overhead
Search WWH ::




Custom Search