Hardware Reference
In-Depth Information
Also, the condition under which an output VC is considered ready should be altered
from creditCounter Œi > 0 to creditCounter Œi > 1, irrespective of the number of
added buffer slots.
To avoid adding those extra buffer slots, requires to perform CC in the same cycle
as SA. Although requests reaching SA concern output ports (not VCs), each one of
them actually refers to a different output VC. Therefore, before knowing specifically
which input VC is granted, and thus, which output VC is allocated to the winning
flit, there is no way to determine which output VC's credit counter to decrement,
unless the ids of all allocated output VCs are multiplexed to the credit counters.
This actually constitutes a complete crossbar of smaller width that would diminish
the delay-reduction benefits of pipelining, and could even lead to an overall delay
increase that could be even worse than the delay of a single-cycle VC-based router.
9.5
Multi-stage Pipelined Organizations for VC-Based
Routers
The primitive pipelined configurations, presented in the previous sections, cut the
operation of the router in a single pipeline point that separates the operation of the
router in two pipeline stages. For example the pipeline at the end of the VA stage
splits the router in two pipeline stages. The first one involves RC and VA tasks,
while the second one includes SA in series with ST/CC. The design of a router that
operates with a faster clock frequency than the 2-stage pipelined alternatives, needs
more pipeline stages. The design of deeper pipelined configurations does not need
any microarchitecture redesign but can be derived simply by stitching together the
primitive pipelined configurations presented so far. For example, by adding pipeline
registers at the end of RC (similar to Sect. 9.2 ) and at the end of VC (similar to
Sect. 9.3 ), allows us to derive a 3-stage pipeline organization that executes RC in one
stage, VA in the second and SA-ST in the last pipeline stage, while the execution
of the tasks of the rest flits are overlapped in time, thus increasing utilization
and effectively router's throughput. This pipelined organization can be graphically
represented as RC|VA|SA-ST, where | denotes the placement of a pipelined register
and - represents the serial connection of two tasks in the same pipeline stage.
Depending on the selected configuration multiple 3-stage pipelined alternatives
can be derived. However, depending on the actual delay profile of each task not
every design point makes sense. In the following paragraphs, we present two
representative 3-stage pipelined organizations as well as a 4-stage version of a
pipelined router. Both cases increase the clock frequency of the router relative to
the single-cycle and the 2-stage pipelined organizations, but the frequency gains
of such deeper pipelined organizations diminish fast. The main reason for such
diminishing returns is the delay of the router's main tasks such as VA and SA that
set an upper bound on the maximum achievable frequency. Also, the main tasks
of each router do not stand alone but include some secondary helper tasks that,
Search WWH ::




Custom Search