Hardware Reference
In-Depth Information
may arrive. Therefore, if all the available words are equal to L f C L b
the ready is
asserted only when the buffer at the receiver is empty, i.e., freeSlots
D L f C L b .
Although this configuration allows for lossless operation, it experiences limited
transmission throughput. For example, assume that the receiver is full, storing
L f C L b words, and stalled. Once the stall condition is removed, the receiver starts
dequeuing one word per cycle. The ready signal is equal to 0 until all L f C L b
words are drained. In the meantime, although free slots exist at the receiver, they
are left unused until the sender is notified that the stall is over and new words can
be accepted. After L f C L b cycles, all L f C L b slots are emptied and the ready
signal is set to 1. However, any new words will only arrive after L f C L b cycles.
During this time frame the receiver remains idle having its buffer empty. Therefore,
in a time frame of 2.L f C L b / cycles the receiver was able to drain L f C L b
words.
This behavior translates to a throughput of 50 %.
More throughput can be gained by increasing the buffer size of the receiver to
L f C L b C k positions. In this scenario we can relax the condition for the assertion
of the ready signal to: ready
L f C L b (from just equality in the
baseline case). Therefore, if the buffer at the receiver is full with L f C L b C k words
at time t 0 , L f C L b words should leave to allow the ready signal to return to one.
L f C L b cycles later the first new words will arrive due to the assertion of the ready
signal. In the meantime the receiver will be able to drain k more words. Therefore,
the throughput seen at the output of the receiver is L f C L b C k
D
1 when freeslots
2.L f C L b / . The throughput can
reach 100 % when k D L f C L b ; the receiver has 2.L f C L b / buffer slots and
ready is asserted when the number of empty slots is at least L f C L b .
The derived bounds hold for the general case. However, if we take into account
some small details that are present in most real implementations the derived bounds
can be relaxed showing that the ready/valid handshake protocol achieves full
throughput with slightly less buffer requirements.
First the minimum number of buffers required to achieve lossless operation can
drop from L f C L b to L f C L b 1. This reduction is achieved since a ready = 0
that reaches the sender can stop directly the transmission of a new word (as shown
by the dotted lines of Fig. 2.15 ) at the output of the sender. Therefore, the actual in-
flight words in the forward path are L f 1 and not L f since the last one is actually
stopped at the output register of the sender itself. Therefore, the ready signal out of
the receiver is computed as follows: ready
D L f C L b 1 else 0.
Second, when a new word is dequeued from the receiver the slot counter is
updated in the same cycle. In this case, when a receiver with k C .L f C L b 1/
buffers is full and starts dequeuing one word per cycle, it will declare its readiness
the same time that it dequeues the .L f C L b 1/th word. The first new word due
will arrive L f C L b 1 cycles later. Thus, during 2.L f C L b 1/ clock cycles the
receiver can drain k C .L f C L b 1/ words. When k D L f C L b 1 the receiver
can achieve 100 % throughput; when the L f C L b 1th word is dequeued the first
new word is enqueued thus leaving no gaps at the receiver's buffer.
D
1 when freeSlots
Search WWH ::




Custom Search