Hardware Reference
In-Depth Information
0
1
2
3
4
5
6
H
cc
ST
RC-VA-SA-DQ
su
LT - BW
LT - BW
cc
ST
B
LT - BW
SA-DQ
LT - BW
cc
s ST
T
LT - BW
SA-DQ
LT - BW
H
RC-VA-SA-DQ
su
cc
ST
LT - BW
LT - BW
Fig. 9.12 An example of the operation of a 2-stage pipelined router that executes RC, VA, SA in
the first pipeline stage and ST in the next, for the flits of two packets that arrive at the same input
VC but acquire a different output VC in their selected output port
In cycle 3, while the head flit is crossing the link (LT) and is stored to the next
router's input buffer (BW), the body flit is traversing the crossbar and the tail flit of
the same packet participates in SA. In cycle 4, the tail flit leaves the router, releasing
also the allocated output VC. The head flit of the following packet does not have to
wait the tail flit of the previous packet to leave, since in cycle 4, it allocated another
available output VC. Of course, if the second packet requested the same output VC
as the one already owned by the first packet, then, inevitably, the head flit would
complete VA not earlier than cycle 5 (after the tail flit releases the output VC in
cycle 4).
9.4.1
Credit Consume and State Update
Although this pipelined configuration is very similar to the pipelined configuration
of a wormhole router that separates SA from ST, still it presents a major difference:
now, CC and SU do not execute in parallel to SA, but are separated by a clock cycle.
The delayed SU translates inevitably to a bubble added by default in an output VC's
flow after a tail leaves.
However, the delayed CC has a different outcome. Once a flit has been granted
in the previous cycle and is placed in the data pipeline register, another one may be
issuing a request to SA for the same output VC (both flits belong to the same packet).
The requesting flit has to qualify its request with the ready state of the output VC;
a request can be made only if enough slots exist at the destination buffer. However,
the in-flight flit (in the data pipeline register) has not consumed yet its credit, but
is about to consume it in the current cycle. This delayed CC causes the input VCs
that are preparing their requests to SA to see an outdated credit value. This situation
corresponds to an increased by one forward latency L f between two flow-controlled
end-points (the input VC buffers of two neighbor routers), which, according to the
analysis made in Chap. 6 , leads to the following requirements: First, the input VC
buffers should be augmented with either one extra slot, to guarantee safe operation
due to the increased round trip time, or two slots, for full-throughput operation.
Search WWH ::




Custom Search