Hardware Reference
In-Depth Information
Calls to inline functions from different points in the code have a similar problem,
since the replicated code in the assembler file cannot be considered in the source
instrumentation. If function inlining does not require replication of code (i.e. within
a loop body), the technique works without limitations.
Summarizing, the error implied by this technique is minimal, and very adequate
considering the accuracy required at the abstraction level.
2.3.1.2
Cache Modeling
Cache memories have also a very important influence on SW performance. Thus,
cache modeling is required to obtain accurate estimations. As it is well-known,
cache memories consist of lines that are arranged in sets, depending on the degree
of associativity. This value determines how many lines are grouped in each set. As
an example, the ARM920T [ 2 ] instruction cache is 512-line size with a 64-degree of
associativity.
When the processor requires the information placed in a certain address, it is re-
quired to search in all the possible locations for that address. The number of locations
depends on the associativity degree. For high degrees, this task is a time-consuming
process, which is really critical in the abstraction level required for efficient DSE.
An example is shown in Fig. 2.8 . While the overhead of the time annotation is only
one additional code line, the cache access requires a function call on each line to
check the instruction address, and a complex search within each call to check if the
values are in cache or not. This drawback is then the main focus of the cache model
to speed up simulation time.
The solution proposed to minimize the simulation overhead is based on three main
improvements: search cache lines instead of single instructions, replace a list search
by a static annotation and move the search from the cache model to the source code.
This is shown in Fig. 2.9 .
As instructions within a basic block are sequential, it is not required to search
for each address in the cache. If one value (instruction or data) is in cache, all the
instructions/data in the same line will be also there. Thus, the number of checks
is extremely reduced. If a common cache line contains 8 values, cache checks are
performed only 1/8th of the times the actual cache is checked. In practice, lines
Original SW code
Cache annotation
while (a){
Fig. 2.8 Comparison
between the annotation for
modeling execution time and
instruction caches. The
different mechanisms result
in different simulation
overheads
b= c+d[a];
c + = 2;
a - = 1;
check_cache(addr);
Time annotation
total_time+=240;
}
Search WWH ::




Custom Search