Hardware Reference
In-Depth Information
￿Theysetthe outLock bit of the winning input. In the next cycles, the body and the
tail flits do not need to qualify their requests again with outAvailable but they are
driven directly from the outLock bit provided that they have valid data to sent.
￿
They drive the ready_in signals of the inputs. The assertion of the appropriate
ready_in signal will cause a dequeue operation to the corresponding input buffer
since both its valid_out and its ready_in signal will be asserted in the same cycle.
The inputs that did not win will see a ready_in
D
0 and thus they will keep their
data in their buffer.
When the tail flit leaves the source it de-allocates the per-input and per-output
state bits outLock Œi and outAvailable , respectively, by driving them to their free
state. Once outAvailable is asserted, the inputs with valid head flits can try to win
arbitration and lock the output for them, provided that there is buffer space available
at the output.
Using this simple configuration, arbitration is actually performed in each cycle
for all flits. However, once outAvailable
D 0, meaning that the output has been
allocated to a specific input, and outLock Œi D 1, meaning that the selected input is
the i th one, then only the requests of that input will reach the arbiter. The requests of
the rest inputs will be nullified expecting the output to be released. In the meantime,
the arbiter always grants input i and updates its priority to position i C 1 (next in
round-robin order so that input i has the least priority in the next cycle). During a
packet's duration from a specific input, the priority of the arbiter will always return
to the same position since only one (and the same) request will be active every
cycle. Once the output is released by the tail of the packet the priority will move to
a different input depending on which input was finally granted.
In many real cases, it is necessary to isolate the timing path of the link from that
of the arbitration and multiplexing. The obvious choice is to add an EB, preferably
with 2 slots, that isolates the timing paths and provides additional buffering
space, i.e., outgoing data can stop independently at the output of the multiplexer.
In this configuration, shown in Fig. 3.3 , the ready signals of the output that were
used as qualifiers in the example of Fig. 3.2 are replaced by the ready signals
of the intermediate EB. The rest request generation logic remains the same and
the o u tA v ailable flag is updated when a head/tail flit passes the output of the
multiplexer and moves to the intermediate EB.
outLock
outA vailable
request
generation
local flow control
arb
Link
ready
valid
data
Input #0
Intermediate EB
Output Buffer
from/to other input(s)
Fig. 3.3
The addition of a local output EB isolates the operation of the switching module from
link traversal
 
Search WWH ::




Custom Search