Graphics Reference
In-Depth Information
It is important to point out that some of these suggestions might have extra CPU
costs associated with them in certain architectures. For example, extra instructions
might be required to operate on a smaller integer or on a bit of a bit field. With more
instructions to execute, the pressure on the instruction cache would increase for the
benefit of the data cache. As such, it is always a good idea to profile the code — ideally
both before and after any changes — to see what improvements can be made.
Field reordering can be difficult to do properly. In a few cases, it is clear that certain
fields should be stored within the same cache line, such as when the fields are always
accessed together. For example, the fields of position, velocity, and acceleration are
frequently accessed simultaneously and could benefit from being stored together. In
most cases, however, it is far from trivial to determine what is the best field arrange-
ment. Ideally, a compiler should take profile data from a previous run to perform the
reordering automatically on a subsequent compile, but language standards may not
allow field reordering and few (if any) compilers currently support such optimization.
One way of getting a better idea of how to best reorder the fields is to access all struc-
ture fields indirectly through accessor functions. It is then possible to (temporarily)
instrument these functions so as to measure which fields are frequently accessed and
which are accessed together, whereupon you can then manually reorder the fields
based on the access statistics. Note that adding padding in some cases improves per-
formance by making two frequently accessed and neighboring fields fall within one
cache line instead of across two.
The concept of hot/cold splitting of structures entails splitting the structure into two
different parts: one containing the frequently accessed fields (the hot part) and the
other the infrequently accessed fields (the cold part). These pieces are allocated and
stored separately, and a link pointer to the corresponding cold part is embedded in
each hot part (Figure 13.2). Although infrequently accessed fields now require a level
of indirection to be accessed, thus incurring an extra cost, the main (hot) structure
can now become smaller and the data cache is used more efficiently. Depending on
how the structures are being accessed, the link pointer might not be necessary and,
for instance, a single index could be used to access both an array of hot parts and
another array of cold parts.
As a simple example of structure splitting, in which no link pointer is required, con-
sider the following code for searching through an array of structures and incrementing
the count field of the element that has the largest value value field.
struct S {
int32 value;
int32 count;
...
} elem[1000];
// Find the index of the element with largest value
int index = 0;
 
Search WWH ::




Custom Search