Hardware Reference
In-Depth Information
SU
outVCAvailable
VA
SA
SA req
RC
VA req
"0...00"
ready VC
candidateOutVC
en
outPort[i]
outVC[i]
RC
deMUX
from other
VCs of the
same input
en
dst
credit
update
CC
head
outVCLock[i]
SU
ready
valid
data
replace VC id field
ST
en
Input VC #i
from other VCs
of the same input
Output #j
from other inputs
Fig. 9.15 The 3-stage pipelined organization of the VC-based router that executes RC and VA
in the first pipeline stage, SA in the second and ST in the last pipeline stage. RC and VA occur
in the same cycle and they are expected to represent the critical path in terms of delay of this
pipelined configuration. However, depending on the number of input/output ports of the router and
the number of VCs and the existence of virtual networks that separate VCs in smaller independent
groups, the critical path may move to the SA stage as well
9.5.2
Three-Stage Pipelined Organization: RC-VA|SA|ST
In the second 3-stage pipelined organization for a VC-based router, RC and VA
are serially executed in the first pipeline stage, while the last two pipeline stages
are dedicated to SA and ST, respectively. The implementation of this pipelined
configuration is shown in Fig. 9.15 .
Since RC and VA occur in the same cycle, the outPort Œi register of the i th input
VC is bypassable, in order for the result of RC to reach the VC allocator in the same
cycle. The first pipeline stage ends up in the outVC Œi and outVCLock Œi pipeline
registers. The output of those registers is used for setting up the requests to SA. The
result of SA is distributed to all inputs causing the dequeue of the winning flits and
their transfer to the data pipeline resister at the input of the crossbar. The flits at the
input of the crossbar are switched to their selected output driven by the registered
select signals of the output multiplexers. The addition of the data pipeline register
increased the round-trip time between the two flow-control endpoints (input buffers
of two neighbor routers) and thus, as explained in Sect. 9.4.1 , the ready condition
for each output VCs as well as the depth of the input VC buffers should be modified
accordingly.
The operation of this pipelined configuration is shown in Fig. 9.16 .Inthefirst
two cycles, the head and the body flit of a packet arrive back-to-back. In cycle 1,
the head flit performs RC and uses the result to request and successfully allocate
an output VC. In cycle 2, it wins in SA moves forward to the intermediate pipeline
Search WWH ::




Custom Search