Pipelined Virtual-Channel-Based Routers - Microarchitecture of Network-on-Chip Routers

Hardware Reference

In-Depth Information

Also, the condition under which an output VC is considered ready should be altered

from creditCounter Œi > 0 to creditCounter Œi > 1, irrespective of the number of

added buffer slots.

To avoid adding those extra buffer slots, requires to perform CC in the same cycle

as SA. Although requests reaching SA concern output ports (not VCs), each one of

them actually refers to a different output VC. Therefore, before knowing specifically

which input VC is granted, and thus, which output VC is allocated to the winning

flit, there is no way to determine which output VC's credit counter to decrement,

unless the ids of all allocated output VCs are multiplexed to the credit counters.

This actually constitutes a complete crossbar of smaller width that would diminish

the delay-reduction benefits of pipelining, and could even lead to an overall delay

increase that could be even worse than the delay of a single-cycle VC-based router.

9.5

Multi-stage Pipelined Organizations for VC-Based

Routers

The primitive pipelined configurations, presented in the previous sections, cut the

operation of the router in a single pipeline point that separates the operation of the

router in two pipeline stages. For example the pipeline at the end of the VA stage

splits the router in two pipeline stages. The first one involves RC and VA tasks,

while the second one includes SA in series with ST/CC. The design of a router that

operates with a faster clock frequency than the 2-stage pipelined alternatives, needs

more pipeline stages. The design of deeper pipelined configurations does not need

any microarchitecture redesign but can be derived simply by stitching together the

primitive pipelined configurations presented so far. For example, by adding pipeline

registers at the end of RC (similar to Sect. 9.2 ) and at the end of VC (similar to

Sect. 9.3 ), allows us to derive a 3-stage pipeline organization that executes RC in one

stage, VA in the second and SA-ST in the last pipeline stage, while the execution

of the tasks of the rest flits are overlapped in time, thus increasing utilization

and effectively router's throughput. This pipelined organization can be graphically

represented as RC|VA|SA-ST, where | denotes the placement of a pipelined register

and - represents the serial connection of two tasks in the same pipeline stage.

Depending on the selected configuration multiple 3-stage pipelined alternatives

can be derived. However, depending on the actual delay profile of each task not

every design point makes sense. In the following paragraphs, we present two

representative 3-stage pipelined organizations as well as a 4-stage version of a

pipelined router. Both cases increase the clock frequency of the router relative to

the single-cycle and the 2-stage pipelined organizations, but the frequency gains

of such deeper pipelined organizations diminish fast. The main reason for such

diminishing returns is the delay of the router's main tasks such as VA and SA that

set an upper bound on the maximum achievable frequency. Also, the main tasks

of each router do not stand alone but include some secondary helper tasks that,

Search WWH ::

Custom Search

Home