Hardware Reference
In-Depth Information
0
1
2
3
4
5
cc
LT-BW
RC-SA-DQ-ST
LT-BW
H
su
cc
SA-DQ-ST
LT-BW
LT-BW
B
cc
RC-SA-DQ-ST
T
LT-BW
LT-BW
su
cc
RC-SA-DQ-ST
H
LT-BW
LT-BW
su
Fig. 5.3
The execution of necessary operations on a single-cycle wormhole router
within a single cycle: (a) the flit's destination field feeds the RC and the outPort
bypass path is used to feed the request generation logic; (b) supposing that the output
is available, the flit performs SA, while in parallel it consumes a credit (CC) and
updates (SU) the outAvailable flag; finally, (c) the grant produced by SA is used to
dequeue (DQ) the flit from the input buffer, in order to traverse the crossbar (ST).
As the head flit moves forward to the output pipeline register, a body flit is written
at the input buffer. The output buffer has enough credits available, thus allowing the
newly arrived body flit to use the stored outPort value and generate a request to SA.
Being the only active request (the requests of all other inputs are nullified, since
outAvailable
D 0), the body flit is granted to move forward, after consuming a
credit. At the same cycle, the head flit is moving to the next router. In cycle 3, the
tail flit follows the same procedure, performing SU as well, in order to release the
allocated port ( outAvailable
D 1), while the next packet's head flit arrives. In cycle
4, all previously allocated resources are already free and the following packet is able
to generate a request and participate in arbitration, whatever its destined output port
might be. Observing the rate of incoming and outgoing flits of this input, one would
notice that a flit only requires a single cycle to exit the router, and no extra cycles
are added in between packets. The only conditions under which a flit may be stalled
is (a) if all the output buffer's slots are full, or (b) a head flit loses in arbitration (in
this case the output port is still utilized, but by a different input).
In the rest of this chapter, we will modify the baseline single-cycle organization
reviewed in this section in a step-by-step manner to derive pipelined implemen-
tations that isolate the RC and the SA stages from the rest with the goal to
increase the router's clock frequency. Then the primitive RC and SA pipelined
organizations will be combined in a plug-and-play manner to derive three-stage
pipelined organizations that lead to even higher clock frequencies.
5.2
The Routing Computation Pipeline Stage
Pipelining RC from SA and ST is the simplest form of pipelining that can be
performed to the router. RC is the first operation of the control path of the router.
Thus, the RC pipelined organization will include only a pipeline register at the
control path of the router, resulting to the organization shown in Fig. 5.4 .
 
Search WWH ::




Custom Search