Hardware Reference
In-Depth Information
outVCAvailable
VA
Credit Counters
SA
VCs
update
RC
update
valid
valid
data
data
Output #0
Input #0
ST
RC
update
update
valid
valid
data
data
Input #N-1
Output #N-1
Fig. 7.8 The organization of a VC-based router connecting in parallel multiple inputs to multiple
outputs that each own supports many VCs. Routing computation ( RC ) is responsible for selecting
an output port for each input VC, while VC allocation ( VA ) and switch allocation ( SA ) handle
the allocation of the output VCs and the output ports to the requesting input VCs. The per-output
multiplexers of the crossbar implement the actual transfer of flits in the switch traversal stage ( ST )
7.2.1
Routing Computation
The main difference of the generic many-to-many router versus the simpler many-
to-one switching module is the role of routing computation and the selection logic
that is involved. In the many-to-many organization every input VC is eligible to
connect to the output VC of any output of the router. Therefore, each input VC is
equipped with the outPort Œi variable that stores the output port that the packet,
currently in the i th input VC, needs to follow in order to reach its destination.
outPort Œi variable is updated after routing computation, which is performed only
when the head flit of a packet reaches the frontmost position of the i th input VC
buffer. The outPort variable is reset to zero once the last flit of the packet, i.e., the
tail flit, is granted to leave the corresponding input VC buffer.
The simplest implementation would introduce a routing computation unit per
input VC, as shown in Fig. 7.9 a. Depending on the complexity of the routing
computation unit this choice may not be the best one. Taking into account that
at most one new head flit will arrive per clock cycle at each input then routing
computation is needed only for one packet. Hence, the routing computation unit can
be shared between all input VCs, as depicted in Fig. 7.9 b. Although a shared routing
computation unit seems like an area saver it does not represent the best choice in
area-delay sense. The delay overhead of the multiplexer and the arbitration unit
(just a simple fixed priority arbiter) may lead to increased implementation area when
the design is synthesized under strict delay constraints. In the rest of this topic we
assume that each input VC is equipped with its own routing computation unit.
 
Search WWH ::




Custom Search