Hardware Reference
In-Depth Information
Chapter 5
Pipelined Wormhole Routers
The single-cycle wormhole router performs all the tasks involved per input and per
output serially. Each packet should first complete routing computation (RC) (in the
cases that lookahead routing computation is not involved in the design of the router),
then fight for gaining access to the output via switch allocation/arbitration (SA) and
move to the appropriate output via the multiplexers of the crossbar (Switch Traversal
- ST). Eventually, the packet will reach the next router, after leaving the output
buffer and crossing the link (Link Traversal - LT). We assume that the input/output
links of the router are independently flow controlled, following the credit-based flow
control described in the previous chapters.
A block diagram of the single-cycle router is shown in Fig. 5.1 . The output
buffers of the router can be either simple pipeline registers or normal flow-controlled
buffers. In the first case, the credit counter refers to the available buffer slots of the
buffer at the input of the next router, while in the second case, the credit counter
mirrors the available buffers of the local output buffer. In the rest of this chapter,
we adopt the first design option and assume that the output of the router consists
of a simple pipeline register for the signals in the forward (valid, data) and in the
backward direction (credit update).
Depending on the system's characteristics the network on chip should be able
to operate at low and at high clock frequencies. In the case of single-cycle routers
the clock frequency of the NoC router is limited by the cumulative delay of all
operations depicted in Fig. 5.1 plus the clocking overhead, which for register-based
implementations (edge-triggered flip-flops) is the sum of the clock to data out delay
and the register's setup time (Weste and Harris 2010 ).
Achieving higher clock frequencies requires the separation of the timing paths
of the single-cycle implementation to multiple shorter ones in terms of delay, called
pipeline stages. In this way, the delay seen between any two registers is decreased,
which allows increasing the operating clock frequency. The separation involves
the addition of pipeline registers between selected tasks that retime the transfer of
information across stages to different cycles of operation. This inevitable retiming
Search WWH ::




Custom Search