Hardware Reference
In-Depth Information
fetch
decode
map
branch bus
issue
execute
dcache
ALU
G
R
AGU
DCache
P
C
M
R
score
board
decoder
mapper
ICache
IR
FPU
F
R
FPU
result bus
Coprocessor 2
L2Cache
router
EWN S
Fig. 8.7 The ICT Many-Core pipelined architecture of a node
8.3.3
The Simulator
The NoC based many-core design transformer is implemented in C++ as a cycle-
accurate simulator. The simulator accepts the configuration as an input file, and
generates the system metrics as an output file. Both input and output files are defined
in XML format by following the rules described in the Design Space Definition file,
as described in Chap. 1.
Figure 8.8 shows the scheme of a single core. A core defines an instruction queue,
whose entry is allocated to the newly instructions fetched from instruction cache, and
released after the instruction was committed. In detail, an instruction fetched from
instruction cache (1) is allocated in the instruction queue (2), and at the same time,
the instruction dependency is built (3). Then, if the required logic unit is idle and the
necessary dependency is satisfied, the instruction is issued (4) to the corresponding
logic unit, and the state of the logic unit is changed to busy in order to block the
instructions using it (5). After the predefined latency and essential operations (e.g.
accessing the register file, accessing the local data cache, and routing message in the
mesh), the instruction is committed, and simultaneously those interrelated resources
and dependencies are released. In above mentioned process, the timing information
is accurately collected to implement a cycle-accurate simulator.
The power model is based on the Princeton University's Wattch power model [ 1 ],
with all parameters updated accordingly to a 0.13 μ m process.
 
Search WWH ::




Custom Search