Hardware Reference
In-Depth Information
data path involves only multiplexing operations inside the input buffer, driven by
the FIFO's pointers and the per-output multiplexers of the crossbar that end up at
the output pipeline register.
5.1.1
Credit Consume and State Update
As explained in Chap. 3 , updating the outAvailable flag and consuming the neces-
sary credits should be triggered once a flit traverses the output multiplexer, and is
about to be written to the output pipeline register. Although this might seem a safe
and reasonable choice - and it is indeed for a single cycle router - it introduces some
non negligible delay overhead. A closer elaboration reveals that in this organization,
SU and CC must occur only after SA is completed and, most importantly, after a
multiplexing of all inputs is performed (for example to check whether a head or tail
flit exists at the granted input). This multiplexing is non-trivial in terms of delay
and, in real-life applications, it may limit the benefits of pipelining.
The problem can be completely eliminated by making an important observation:
both CC and SU can be executed without the need of knowing specifically which
input allocates the output port or consumes an output credit. Simply knowing that
some input wins in arbitration or sends a flit forward, suffices. Therefore, since the
SA result is not required, those operations can occur in parallel to SA. For SU, this
translates to checking whether any request from a head or a tail flit exists, to lower or
raise the outAvailable flag, respectively. CC decrements the output credit counter if
the corresponding output receives at least one request. Notice that once the output's
outAvailable flag is lowered, request generation forbids any requests to that output,
unless they originate from the winner input. The outAvailable flag is raised again
once a tail flit makes a request (receiving a grant is guaranteed) and its new updated
value will be visible to the rest inputs in the next clock cycle.
5.1.2
Example of Packet Flow in the Single-Cycle Router
The operations executed in the single-cycle wormhole router of Fig. 5.2 can be seen
in Fig. 5.3 . The execution diagram refers to the behavior of a single input that
receives a consecutive traffic of incoming packets consisting of 3 flits (one head,
one body and one tail flit). This kind of traffic is selected since it reveals easily
any latency/throughput-related inefficiencies of pipelined organizations that will be
presented in later sections. In parallel, the rest inputs follow a similar execution
assuming that their requests and data move to a different output. A certain output
can host the packet (on a flit-by-flit basis) of only one input at a time.
In cycle 0, a head flit is written at the input buffer (Buffer Write - BW), after
crossing the link (Link Traversal - LT). The flit immediately appears at the frontmost
position of the input buffer in cycle 1, and is able to execute all necessary operations
Search WWH ::




Custom Search