Baseline Virtual-Channel Based Switching Modules and Routers - Microarchitecture of Network-on-Chip Routers - page 117

Hardware Reference

In-Depth Information

outVCAvailable

VA

Credit Counters

SA

VCs

update

RC

update

valid

valid

data

data

Output #0

Input #0

ST

RC

update

update

valid

valid

data

data

Input #N-1

Output #N-1

Fig. 7.8 The organization of a VC-based router connecting in parallel multiple inputs to multiple

outputs that each own supports many VCs. Routing computation ( RC ) is responsible for selecting

an output port for each input VC, while VC allocation ( VA ) and switch allocation ( SA ) handle

the allocation of the output VCs and the output ports to the requesting input VCs. The per-output

multiplexers of the crossbar implement the actual transfer of flits in the switch traversal stage ( ST )

7.2.1

Routing Computation

The main difference of the generic many-to-many router versus the simpler many-

to-one switching module is the role of routing computation and the selection logic

that is involved. In the many-to-many organization every input VC is eligible to

connect to the output VC of any output of the router. Therefore, each input VC is

equipped with the outPort Œi variable that stores the output port that the packet,

currently in the i th input VC, needs to follow in order to reach its destination.

outPort Œi variable is updated after routing computation, which is performed only

when the head flit of a packet reaches the frontmost position of the i th input VC

buffer. The outPort variable is reset to zero once the last flit of the packet, i.e., the

tail flit, is granted to leave the corresponding input VC buffer.

The simplest implementation would introduce a routing computation unit per

input VC, as shown in Fig. 7.9 a. Depending on the complexity of the routing

computation unit this choice may not be the best one. Taking into account that

at most one new head flit will arrive per clock cycle at each input then routing

computation is needed only for one packet. Hence, the routing computation unit can

be shared between all input VCs, as depicted in Fig. 7.9 b. Although a shared routing

computation unit seems like an area saver it does not represent the best choice in

area-delay sense. The delay overhead of the multiplexer and the arbitration unit

(just a simple fixed priority arbiter) may lead to increased implementation area when

the design is synthesized under strict delay constraints. In the rest of this topic we

assume that each input VC is equipped with its own routing computation unit.

Next Page

Microarchitecture of Network-on-Chip Routers

Search WWH ::

Custom Search

Home