Graphics Reference
In-Depth Information
(a) Regular grid without merging
(b) The same grid after merging phase
Figure 5.5. Adaptive tile layout.
5.4 Optimizations
5.4.1 Minimize Number of Batches
Adaptive tile layout. The initial tile layout can be optimized dynamically by an-
alyzing it to merge adjacent compatible cells together (see Figure 5.5). Two tiles
are compatible if they share the same list of aggregated sprites.
This merging operation results in fewer primitives being sent down to the
GPU. Fewer primitives means less memory trac (lighter vertex buffer and
smaller Parameter Buffer usage) and also less vertex shader work.
Single draw call. When using the same texture for all layers (e.g., when rendering
a particle system) or when all tiles are using the same textures in the same order,
we can drastically reduce the number of batches by constructing a single dynamic
VBO to render the whole grid at once.
For each tile, and for each transparent layer it aggregates, the sprite texture
coordinates are extrapolated to the corners of each cell. This computation is
performed on CPU and the values are appended into the vertex buffer. Thus,
each vertex now contains one vec2 position and eight vec2 texture coordinates;
no uniforms are required anymore.
The vertex shader being used is a simple pass-through, which will allow all
attributes to be interpolated and passed to the fragment shader (see Listing 5.3).
We can notice that, even when all eight layers that can be rendered by a tile are
not used, this shader will still consume eight interpolators. That many varyings
can affect performance negatively as the large amount of memory required to
store the interpolated values might lower the number of fragments in flight and
could also trash the post-transform cache [McCaffrey 12].
When the maximum number of layers is not being used in a given tile, a
solution can be found through dynamic branching to early out and not take the
Search WWH ::




Custom Search