Hardware Reference
In-Depth Information
number of inputs and thus the increase of the logic depth of the SA2 arbiters is
balanced by the decrease of the logic depth of the SA1 arbiters.
Input speedup is a useful technique for handling the possible inefficiencies of
separable switch allocation (Dally and Towles 2004 ; Rao et al. 2014 ). The only
drawback remains the increased wiring between the input VC buffers and the output
multiplexers of the crossbar that may limit the effectiveness of input speedup when
the network operates using very wide flits. At the same time, input speedup requires
changing also the flow-control mechanism, since in this case, multiple credits update
need to return in each cycle; one from each input VC that was selected by the switch
allocator.
Applying input speedup to its maximum extent and by assuming that each input
VC represents an isolated virtual network, allows us to design VC-based routers
using simple wormhole switches in parallel. Once each VC is a separate virtual
network, VA is not required, since no packet will ever change the VC that it
already uses. Also, since maximum speedup is enabled, the packets that belong
to the same VC but come from different inputs can be switched together using a
private wormhole router as shown in Fig. 8.7 . For a router that supports V VCs,
V wormhole routers are used in parallel that each one handles the flits of one VC
from all inputs (Gilabert et al. 2010 ). Each wormhole router independently from
the rest solves the output contention and prepares the flits that should depart from
each output. At the output of the VC-based router the flit of one sub-router should
be selected, effectively selecting which VC will use the output link in this cycle.
This requires an additional arbiter and multiplexer that selects which VC should be
served by each output. By keeping an independent flow control mechanism between
the input VC buffers and the inputs of the wormhole router, as well as, the output
of the VC-based router and the outputs of the wormhole routers, single, or multi-
cycle/pipelined configurations can be derived. For example, if we assume that each
Input #0
VC #0
Output #0
NxN
wormhole
router
NxN
wormhole
router
VC #1
NxN
wormhole
router
NxN
wormhole
router
Input #N-1
Output #N-1
VC #V-1
NxN
wormhole
router
NxN
wormhole
router
Fig. 8.7 Each VC of the router can be serviced by a private wormhole router. The results of all sub-
routers are merged at the output of the VC-based router using another arbitration and multiplexing
step that merges also the VC-based flow control of the output links
 
Search WWH ::




Custom Search