Tiled Deferred Blending - GPU Pro: Advanced Rendering Techniques

Graphics Reference

In-Depth Information

5.2 Algorithm

The algorithm is based on a division of the rendering area into smaller screen-

space tiles. Tile layouts are discussed in Section 5.3.1.

Once a layout has been defined, the general algorithm can be summarized in

the following steps (see Figure 5.2).

1. Project vertices on CPU to find the screen-space extent of each sprite quad.

2. For each quad, find intersecting tiles and store their ID in each affected cell.

3. (optional) Optimize grid by grouping compatible cells (see Section 5.4.1).

4. For each tile, compute required information to render aggregated sprites.

5. For each non empty tile, for each fragment use interpolated texture coordi-

nates to sample bound textures and blend results manually.

First, the sprites vertices are transformed on the CPU to figure out where

they will land on screen after projection. Once the screen-space positions are

known, we can compute, for each tile, the list of sprites affecting it. The lack

of compute shader on OpenGL ES forces us to make all those computations on

CPU.

For complex scenes containing lots of sprites to blend together, SIMD instruc-

tions (e.g., ARM NEON 1 Instruction Set) provide a good opportunity to reduce

the extra CPU overhead induced by the technique. Libraries 2 areavailabletoget

you started quickly.

After having computed the list of sprites affecting each tile, we can, for each

cell and for each sprite, compute the texture coordinate transform that will trans-

form the tile texture coordinates into those of each sprite it aggregates. Those

3

2 transform matrices (2D rotation + translation) will be passed later as uni-

forms to the vertex shader.

Finally, the render phase itself simply consists in rendering the tiles, one at a

time, using the interpolated texture coordinates to sample each texture (the same

texture can be bound to several samplers if desired) and to blend the intermediate

results manually, respecting the transparency order.

An optional optimization phase (step 3) can take place after building the per-

tile sprite lists and before computing the extrapolated texture coordinates; this

optimization consists in merging together cells that share the exact same sprites,

thus lowering the number of primitives sent to the GPU.

×

1 ARM is a registered trademark of ARM Limited (or its subsidiaries) in the EU and/or

elsewhere. NEON is a trademark of ARM Limited (or its subsidiaries) in the EU and/or

2 https://code.google.com/p/math-neon/

Search WWH ::

Custom Search

Home