PipelinedWormhole Routers - Microarchitecture of Network-on-Chip Routers

Hardware Reference

In-Depth Information

0

1

2

3

4

5

cc

LT-BW

RC-SA-DQ-ST

LT-BW

H

su

cc

SA-DQ-ST

LT-BW

B

cc

RC-SA-DQ-ST

T

LT-BW

su

cc

RC-SA-DQ-ST

H

LT-BW

su

Fig. 5.3

The execution of necessary operations on a single-cycle wormhole router

within a single cycle: (a) the flit's destination field feeds the RC and the outPort

bypass path is used to feed the request generation logic; (b) supposing that the output

is available, the flit performs SA, while in parallel it consumes a credit (CC) and

updates (SU) the outAvailable flag; finally, (c) the grant produced by SA is used to

dequeue (DQ) the flit from the input buffer, in order to traverse the crossbar (ST).

As the head flit moves forward to the output pipeline register, a body flit is written

at the input buffer. The output buffer has enough credits available, thus allowing the

newly arrived body flit to use the stored outPort value and generate a request to SA.

Being the only active request (the requests of all other inputs are nullified, since

outAvailable

D 0), the body flit is granted to move forward, after consuming a

credit. At the same cycle, the head flit is moving to the next router. In cycle 3, the

tail flit follows the same procedure, performing SU as well, in order to release the

allocated port ( outAvailable

D 1), while the next packet's head flit arrives. In cycle

4, all previously allocated resources are already free and the following packet is able

to generate a request and participate in arbitration, whatever its destined output port

might be. Observing the rate of incoming and outgoing flits of this input, one would

notice that a flit only requires a single cycle to exit the router, and no extra cycles

are added in between packets. The only conditions under which a flit may be stalled

is (a) if all the output buffer's slots are full, or (b) a head flit loses in arbitration (in

this case the output port is still utilized, but by a different input).

In the rest of this chapter, we will modify the baseline single-cycle organization

reviewed in this section in a step-by-step manner to derive pipelined implemen-

tations that isolate the RC and the SA stages from the rest with the goal to

increase the router's clock frequency. Then the primitive RC and SA pipelined

organizations will be combined in a plug-and-play manner to derive three-stage

pipelined organizations that lead to even higher clock frequencies.

5.2

The Routing Computation Pipeline Stage

Pipelining RC from SA and ST is the simplest form of pipelining that can be

performed to the router. RC is the first operation of the control path of the router.

Thus, the RC pipelined organization will include only a pipeline register at the

control path of the router, resulting to the organization shown in Fig. 5.4 .

Microarchitecture of Network-on-Chip Routers

Search WWH ::

Custom Search

Home