Hardware Reference
In-Depth Information
a fully combinational logic path. This characteristic is a limiting factor in terms of
delay since in large pipelines of EBs, possibly spanning across many NoC routers,
the delay due to the combinational propagation of the handshake signals may exceed
the available delay budget.
The design of a 2-slot EB can be alternatively achieved by connecting in series
a pair of bypass EB and a pipelined EB. This organization leads to the designs
presented in Cortadella et al. ( 2006 ) where the 2-slot EB have been derived using
FSM logic synthesis. Also, in the same paper, it was shown how to implement a
2-slot EB using 2 latches in series, a main and an auxiliary one, by controlling
accordingly the clock phases, and the transparency of each latch.
2.2
Generic FIFO Queues
Even if the sender and the receiver can be “synchronized” by exchanging the
necessary flow control information via the ready/valid signals, the designer still
needs to answer several critical questions. For example, how can we keep the sender
busy before the receiver is stalled? The direct answer to this question is to replace
simple 2-slot EBs with larger FIFO buffers that will store many more incoming
words and implement the same handshake. In this way, when the receiver is stalled
the sender can be kept busy for some extra cycles. If the receiver remains stalled for
a long period of time then inevitably all the slots of the buffer will be occupied and
the sender should be informed and stop transmission. Actually, FIFOs are needed to
absorb any bursty incoming traffic at the receiver and effectively increase the overall
throughput, since the network can host a larger number of words per channel before
being stalled.
Larger FIFOs can be designed by adding more HBEBs in parallel and by
enhancing the tail and head pointers to address a larger set of buffer positions
for a push or a pop, respectively (Fig. 2.10 ). When new data are pushed in the
FIFO they are written in the position indexed by the tail pointer; in the same cycle
the tail pointer is increased (modulo the size of the FIFO buffer) pointing to the
next available buffer. Equivalently, when new data are popped from the FIFO, the
selected EB is indexed by the head pointer. During the dequeue the head pointer
is increased (modulo the size of the FIFO buffer). The ready and valid signals sent
outside the FIFO are generated in exactly the same way as in the case of the 2-
slot EB. If the head and tail pointers follow the onehot encoding, their increment
operation does not include any logic and can be implemented using a simple cyclic
shift register (ring counter).
Designing a FIFO queue using multiple HBEBs in parallel can scale efficiently
to multiple queue positions. However, the read (pop) path of the queue involves a
large multiplexer that induces a non-negligible delay overhead. This read path can
be completely isolated by adding a 2-slot EB at the output of the parallel FIFO,
supported also by the appropriate bypass logic shown in Fig. 2.11 . When the FIFO
is empty, data are written to the frontmost 2-slot EB. The parallel FIFO starts to fill
Search WWH ::




Custom Search