PipelinedWormhole Routers - Microarchitecture of Network-on-Chip Routers

Hardware Reference

In-Depth Information

correctly this body flit driven by the stored grant signals at the corresponding output.

The tail flit in cycle 4 repeats the same procedure and moves to ST without waiting

for any SA grants. The head flit of the new incoming packet reaches the frontmost

position of the input buffer in cycle 5 initiating a new round of RC, SA and ST

operations, similar to the head flit of the previous packet.

This stored grants approach turns the previously presented “elementary” SA

pipeline an obsolete choice. It reduces bubbles significantly with only minimal

delay overhead to the router's control path. It also looks as if it simplifies the

allocation procedure. However, in essence, it simply adds extra state registers to

the arbiter's path and a multiplexer at both control and data paths. Therefore, this

approach is avoided in single cycle version, and the original request generation logic

is preferred. For the same reason, the stored grants, will not be used at the next

pipeline SA configuration that uses a pipeline register both in the control and in the

datapath although it could have been a possible choice. The stored-grants approach

for the body and tail flits is a useful pipeline alternative when SA is separated from

ST solely in the control path.

5.3.3

Idle-Cycle Free Operation of the SA Pipeline Stage

The dependencies arising from delaying the delivery of the grants of SA to the

crossbar and to the inputs of the routers can be alternatively resolved by adding an

extra input pipeline register to the data path. The added data pipeline register, shown

in Fig. 5.13 , does not have to be flow-controlled since no flit will ever stall in this

position. This data pipeline register is just used to align the arrival of the registered

grant signals with the arrival of the corresponding flit to the input of the crossbar.

Since the grant signals should be always aligned to the corresponding data, the

delivery of the grant signals to the inputs should move before the grant pipeline

register (as done in Fig. 5.13 ). This is needed since the dequeued data will reach

the input of the crossbar one cycle later; they will spend one cycle passing the data

pipeline register. This extra cycle also requires an extra buffer slot at the output

buffer (at the input of the next router) for full throughput operation, since forward

latency L f is increased by 1.

The pipeline flow diagram that corresponds to the pipelined organization of

Fig. 5.13 is shown in Fig. 5.14 . In this case, the head flit in cycle 1 performs RC

and SA and after accepting the grants in the same cycle it is dequeued and moves

to the data pipeline register after having consumed the necessary credit. In cycle 2,

the head flits leaves the data pipeline register and moves to the selected output using

the grants produced by the corresponding output arbiter in the previous cycle. In

the same cycle, the following body flit that arrived in cycle 1 and is placed now

in the frontmost position of the input buffer, can perform SA and once granted

it can move also to the data pipeline register consuming in parallel the necessary

downstream credit. The same holds for the following tail flit that can perform all

needed operations without experiencing any idle cycles. In cycle 4, the full overlap

Search WWH ::

Custom Search

Home