Hardware Reference
In-Depth Information
Equivalently the slot counter can be moved at the output of the switching module
(at the other side of the link) and act as a local output credit counter as shown in
Fig. 3.4 b. The output credit counter mirrors the available buffer slots of the output.
It sends a ready signal to all inputs when the number of available buffer slots at the
output buffer is greater than zero. The inputs qualify their valid signals exactly the
same way as in the case of the ready/valid handshake. Therefore, when a certain
input is connected to the output (the output was available and the arbiter granted the
particular input), it knows exactly about the availability of new credits at the output
via the output credit counter.
It should be noted that the ready signal that is asserted when creditCounter >0,
is only driven by the current state of the credit counter. The credit decrement and
increment signals update only the value of the credit counter and the new value
will be seen by the ready signal in the next clock cycle. Therefore, the dependency
cycle formed by credit decrement
!
ready
!
request generation
!
arbiter's grant
!
credit decrement is broken after the ready signal, which also helps in isolating
the timing paths starting request generation logic. Equivalently, each input buffer,
independent from the rest, sends also its own credit update in the backward direction
once it dequeues a new flit.
Using the output credit counter simplifies also the addition of pipeline stages on
the link. For example in Fig. 3.4 c the output of the multiplexer is isolated by a simple
pipeline register, i.e., outgoing data cannot stop at this point, and the readiness of
the output buffer is handled via the output credit counter. As described also in the
previous chapter referring to a single point-to-point link, even if additional pipeline
stages are added between inputs and the output once the ready signal is consumed
by the input without any further delay the credit protocol guarantees maximum
throughput will the least buffering requirements. In this case, the receiver needs
to provide 3 buffer slots to absorb the in-flight traffic due to the increased forward
and backward latency L f D 2, L b D 2.
3.1.2
Granularity of Buffer Allocation
Under WH switching principle, each flit of a packet can move to the output assuming
that at least one credit is available. On the contrary VCT requires flow control to
extend at the packet level by allocating any buffering resources at packet granularity.
In both cases the flits of the packets are not interleaved at the output. Interleaving is
enabled by virtual channels that will be presented in the following chapters.
In a packet-based flow control, which is commonly used in off-chip networks,
both the channels and the buffers are allocated in units of packets, while flit-based
flow control allocates both resources in units of flits. On-chip networks have often
utilized the flit-based flow control. The main difference between packet and flit-
level flow control is in how the buffer resource is allocated. With packet-based flow
control before any packet moves to an output, the buffer for the entire packet needs
to be allocated; thus, for a packet of L flits, an input needs to obtain L credits before
Search WWH ::




Custom Search