M3-SCoPE: Performance Modeling of Multi-Processor Embedded Systems for Fast Design Space Exploration - Multi-objective Design Space Exploration of Multiprocessor SoC Architectures

Hardware Reference

In-Depth Information

Calls to inline functions from different points in the code have a similar problem,

since the replicated code in the assembler file cannot be considered in the source

instrumentation. If function inlining does not require replication of code (i.e. within

a loop body), the technique works without limitations.

Summarizing, the error implied by this technique is minimal, and very adequate

considering the accuracy required at the abstraction level.

2.3.1.2

Cache Modeling

Cache memories have also a very important influence on SW performance. Thus,

cache modeling is required to obtain accurate estimations. As it is well-known,

cache memories consist of lines that are arranged in sets, depending on the degree

of associativity. This value determines how many lines are grouped in each set. As

an example, the ARM920T [ 2 ] instruction cache is 512-line size with a 64-degree of

associativity.

When the processor requires the information placed in a certain address, it is re-

quired to search in all the possible locations for that address. The number of locations

depends on the associativity degree. For high degrees, this task is a time-consuming

process, which is really critical in the abstraction level required for efficient DSE.

An example is shown in Fig. 2.8 . While the overhead of the time annotation is only

one additional code line, the cache access requires a function call on each line to

check the instruction address, and a complex search within each call to check if the

values are in cache or not. This drawback is then the main focus of the cache model

to speed up simulation time.

The solution proposed to minimize the simulation overhead is based on three main

improvements: search cache lines instead of single instructions, replace a list search

by a static annotation and move the search from the cache model to the source code.

This is shown in Fig. 2.9 .

As instructions within a basic block are sequential, it is not required to search

for each address in the cache. If one value (instruction or data) is in cache, all the

instructions/data in the same line will be also there. Thus, the number of checks

is extremely reduced. If a common cache line contains 8 values, cache checks are

performed only 1/8th of the times the actual cache is checked. In practice, lines

Original SW code

Cache annotation

while (a){

Fig. 2.8 Comparison

between the annotation for

modeling execution time and

instruction caches. The

different mechanisms result

in different simulation

overheads

b= c+d[a];

c + = 2;

a - = 1;

check_cache(addr);

Time annotation

total_time+=240;

}

Multi-objective Design Space Exploration of Multiprocessor SoC Architectures

Search WWH ::

Custom Search

Home