Baseline Virtual-Channel Based Switching Modules and Routers - Microarchitecture of Network-on-Chip Routers - page 123

Hardware Reference

In-Depth Information

outVCAvailable

per input VC

V

selOutVC[i]

N

reqPort[i]

V:1

arb

V

V

V

V

reqVC[i]

NxV

requests

V:1

arb

V

V

V

total N

V:1 arbiters

Fig. 7.14 An alternative organization of VA1 stage of the VC allocator that offers delay benefits,

under small area overhead. It replaces a mux, one arbiter and a demux with N arbiters that run

in parallel and prepare the output VC requests of each input VC in a form that fits directly the

connections of the arbiters in the VA2 stage

available output VCs and then in VA2 each output VC selects at most one input

VC. The input VCs are informed by the arbiters of VA2 if their request was finally

accepted.

Faster Organization of the VA1 Stage

Implementation results prove that the (de)multiplexing logic at VA1 has a non trivial

contribution to the overall delay of VC allocation. A simple microarchitectural

change can completely eliminate this logic and speedup significantly VC allocation.

The new fast organization of VA1 is shown in Fig. 7.14 .

First all the output VC availability flags of all outputs are masked with the reqVC

vector of each input VC without any pre-selection step. The resulting availability

vectors, e.g., one for each output, are independently arbitrated by V W 1 arbiters

selecting one available VC for each output. From the selected output VCs (one

available VC per output), each input VC needs only one of them; the one that

belongs to the destined output port. Selecting one does not require any multiplexing

but just an additional masking operation with the output port request ( outPort Œi )of

the i th input VC. The selected output VC in all outputs will become zero except

the one that matches the destination output port. Therefore, after this last step, the

output VC request of an input VC is ready and aligned per output as needed by the

output VC arbiters of the second stage. Thus additional demultiplexing/alignment

logic is not needed and significant delay is saved. The cost of this method is that it

replaces a mux ( outVCAvailable multiplexer of Fig. 7.13 ), one arbiter and a demux

(Fig. 7.13 ), with N arbiters that run in parallel and offer faster implementation.

Please notice also that since the outPort Œi request bits are used only after the

V W 1 arbitration step then routing computation can be overlapped in time with the

Next Page

Microarchitecture of Network-on-Chip Routers

Search WWH ::

Custom Search

Home