PipelinedWormhole Routers - Microarchitecture of Network-on-Chip Routers

Hardware Reference

In-Depth Information

Chapter 5

Pipelined Wormhole Routers

The single-cycle wormhole router performs all the tasks involved per input and per

output serially. Each packet should first complete routing computation (RC) (in the

cases that lookahead routing computation is not involved in the design of the router),

then fight for gaining access to the output via switch allocation/arbitration (SA) and

move to the appropriate output via the multiplexers of the crossbar (Switch Traversal

- ST). Eventually, the packet will reach the next router, after leaving the output

buffer and crossing the link (Link Traversal - LT). We assume that the input/output

links of the router are independently flow controlled, following the credit-based flow

control described in the previous chapters.

A block diagram of the single-cycle router is shown in Fig. 5.1 . The output

buffers of the router can be either simple pipeline registers or normal flow-controlled

buffers. In the first case, the credit counter refers to the available buffer slots of the

buffer at the input of the next router, while in the second case, the credit counter

mirrors the available buffers of the local output buffer. In the rest of this chapter,

we adopt the first design option and assume that the output of the router consists

of a simple pipeline register for the signals in the forward (valid, data) and in the

backward direction (credit update).

Depending on the system's characteristics the network on chip should be able

to operate at low and at high clock frequencies. In the case of single-cycle routers

the clock frequency of the NoC router is limited by the cumulative delay of all

operations depicted in Fig. 5.1 plus the clocking overhead, which for register-based

implementations (edge-triggered flip-flops) is the sum of the clock to data out delay

and the register's setup time (Weste and Harris 2010 ).

Achieving higher clock frequencies requires the separation of the timing paths

of the single-cycle implementation to multiple shorter ones in terms of delay, called

pipeline stages. In this way, the delay seen between any two registers is decreased,

which allows increasing the operating clock frequency. The separation involves

the addition of pipeline registers between selected tasks that retime the transfer of

information across stages to different cycles of operation. This inevitable retiming

Search WWH ::

Custom Search

Home