Hardware Reference
In-Depth Information
7.1.7
Output-First Allocation
The order of arbitration in either VA or SA can be changed from input first to output
first. In the case of output-first allocation all input VCs forward first their requests to
the output arbiters for SA and to the output VC arbiters for VA. In this way, in VA,
it is possible that one input VC receives a grant from more than one output VCs.
Selecting one of them requires an additional local per-input VC arbitration step.
Equivalently, in SA, with output-first arbitration it is possible that two input VCs
of the same input to receive simultaneously a grant from the same or a different
output. Then, since only one input VC can be served from each input, an additional
arbitration step should take place that would resolve the conflict.
Output first allocation has been proven superior in terms of matching quality
when compared to input-first allocation (Becker and Dally 2009 ). However, in
terms of hardware implementation input-first allocation is more delay efficient. The
reason for this efficiency is that input-first allocation decisions allow the concurrent
implementation of the necessary multiplexing. For example, the grants of SA1 can
be used directly to multiplex the flit of the winning VC in parallel to SA2 arbitration.
Thus, when SA2 finishes, the data to the output multiplexer are ready waiting for the
corresponding grants. On the contrary, in output-first allocation, the input and the
output multiplexers should wait both SA2 and SA1 to complete before switching
the flits from input VCs to the output. In the pipelined implementations those
differences are partially alleviated, while still observing that input-first allocation
provides faster circuits.
7.2
Many-to-Many Connections Using an Unrolled
Datapath: A Complete VC-Based Router
The design of a generic VC-based router that supports many-to-many connections
using a fully unrolled switching datapath, i.e., a crossbar, can be easily derived as
an extension to the already presented many-to-one switching module. The baseline
datapath of the generic VC-based router is shown in Fig. 7.8 . Similarly to the many-
to-one case, a pipeline register is used at each output, which cuts off the timing path
of the link from the paths of the router.
The presented router is just an unrolled version of the baseline switching module
shown in Fig. 7.1 . Every output is equipped with an output multiplexer, while it
includes also V credit counters used for the link-level flow control and the V
outVCAvailable flags that are used during VC allocation. The VA and SA stages
operate in a separable manner taking local per-input or per-input VC and global per
output or per output VC decisions that guide the assignment of input to output VCs
and the allocation of the output ports of the router on cycle-by-cycle basis.
Search WWH ::




Custom Search