Hardware Reference
In-Depth Information
Primitive Cases
The derived results can be applied even to the simple EBs presented in the beginning
of this chapter. An equivalent flow-control model for a 2-slot EB that operates under
ready/valid handshake experiences a forward latency of L f D 1 due to the register
present at the output of the HBEBs and a backward latency L b D 1, since the
ready signal is produced by the full flags of the HBEBs. According to the analysis
presented, this configuration needs L f C L b 1 D 1 buffer for lossless operation
and 2 times that for 100 % throughput as already supported by the 2-slot EB. The
configuration that uses only 1 slot, while keeping L f and L b equal to 1, corresponds
to the HBEB that offers lossless operation while allowing only for 50 % of link-level
throughput.
The derived model does not cover the degenerate case of 1-slot pipelined and
bypass EBs. For example even if the pipelined EB has L b D 0 (fully combinational
backpressure propagation) and L f D 1, the ready backepressure signal spans
multiple stages of buffering and extend the borders of a single sender-receiver pair.
2.5.2
Pipelined Links with Elastic Buffers
As shown so far the use of simple pipeline registers between two flow-controlled
endpoints increases the round-trip time of the flow control mechanism and neces-
sitates the use of additional buffering at the receiver to accommodate all in-flight
words. In a NoC environment, it is possible and also desirable to replace the forward
and backward pipeline registers with flow-controlled EB stages, thus limiting the
flow-control notification cycle per stage (Concer et al. 2008 ; Michelogiannakis and
Dally 2013 ).
Figure 2.16 a shows a pipelined link that uses only pipeline registers and needs
10 buffers at the receiver for achieving 100 % throughput and safe operation. Recall
that in this pipelined configuration the sender sets valid
1 when it observes locally
a ready signal equal to 1 to avoid the receiver writing by mistake multiple copies
of the same word. If one stage of the pipeline is transformed to an EB, as shown in
Fig. 2.16 b, then the round-trip time at the second part of the link reduces by 2 and
thus a buffer with 6 slots suffices for the receiver. By extending this approach to all
pipeline stages, the same operation can be achieved by the architecture shown in
Fig. 2.16 c where the pipelined link consists of only EBs. In this case, the buffer at
the receiver can have only a 2-slot EB, since it experiences a local L f D L b D 1 at
the last stage of the link. In overall, this strategy achieves both to isolate the timing
paths by registering both the data and the backpressure signals and to reduce the
total number of buffers required for lossless and full throughput communication. In
fact, in this example, with using only 3 stages of 2-slot EBs and one 2-slot EB at
the receiver achieves the same behavior as in the baseline case of Fig. 2.16 ausing8
buffers in total that are distributed at the receiver and on the link.
D
Search WWH ::




Custom Search