PipelinedWormhole Routers - Microarchitecture of Network-on-Chip Routers

Hardware Reference

In-Depth Information

data path involves only multiplexing operations inside the input buffer, driven by

the FIFO's pointers and the per-output multiplexers of the crossbar that end up at

the output pipeline register.

5.1.1

Credit Consume and State Update

As explained in Chap. 3 , updating the outAvailable flag and consuming the neces-

sary credits should be triggered once a flit traverses the output multiplexer, and is

about to be written to the output pipeline register. Although this might seem a safe

and reasonable choice - and it is indeed for a single cycle router - it introduces some

non negligible delay overhead. A closer elaboration reveals that in this organization,

SU and CC must occur only after SA is completed and, most importantly, after a

multiplexing of all inputs is performed (for example to check whether a head or tail

flit exists at the granted input). This multiplexing is non-trivial in terms of delay

and, in real-life applications, it may limit the benefits of pipelining.

The problem can be completely eliminated by making an important observation:

both CC and SU can be executed without the need of knowing specifically which

input allocates the output port or consumes an output credit. Simply knowing that

some input wins in arbitration or sends a flit forward, suffices. Therefore, since the

SA result is not required, those operations can occur in parallel to SA. For SU, this

translates to checking whether any request from a head or a tail flit exists, to lower or

raise the outAvailable flag, respectively. CC decrements the output credit counter if

the corresponding output receives at least one request. Notice that once the output's

outAvailable flag is lowered, request generation forbids any requests to that output,

unless they originate from the winner input. The outAvailable flag is raised again

once a tail flit makes a request (receiving a grant is guaranteed) and its new updated

value will be visible to the rest inputs in the next clock cycle.

5.1.2

Example of Packet Flow in the Single-Cycle Router

The operations executed in the single-cycle wormhole router of Fig. 5.2 can be seen

in Fig. 5.3 . The execution diagram refers to the behavior of a single input that

receives a consecutive traffic of incoming packets consisting of 3 flits (one head,

one body and one tail flit). This kind of traffic is selected since it reveals easily

any latency/throughput-related inefficiencies of pipelined organizations that will be

presented in later sections. In parallel, the rest inputs follow a similar execution

assuming that their requests and data move to a different output. A certain output

can host the packet (on a flit-by-flit basis) of only one input at a time.

In cycle 0, a head flit is written at the input buffer (Buffer Write - BW), after

crossing the link (Link Traversal - LT). The flit immediately appears at the frontmost

position of the input buffer in cycle 1, and is able to execute all necessary operations

Search WWH ::

Custom Search

Home