Baseline Switching Modules and Routers - Microarchitecture of Network-on-Chip Routers

Hardware Reference

In-Depth Information

the packet can be sent. Once the buffer for the entire packet has been allocated,

the channel resource can be allocated on flit granularity. A multi-flit packet can be

interrupted during transmission from input to output; the packet will not necessarily

be sent continuously. However, when the head flit arrives at an input, it reserves the

next L slots in the output buffer such that the whole packet to be kept at the output

in the case of downstream stall.

Even if using flit-level flow control, buffers can be allocated at the packet level

by employing atomic buffer allocation. In this case, the head flit of a packet is not

buffered behind the tail flit of another packet in the same buffer. In effect, buffers

are implicitly allocated on packet granularity, even if flit-based flow control is used.

This operation can be achieved by not releasing the outAvailable flag when the tail

flit arrives at the output buffer but when it leaves the output buffer. In this way, when

the next head flit arrives, it will find the output buffer empty. In every case that the

buffers are allocated at the packet level the amount of buffers required is equal to the

size of the longest packet, which inevitably leads to low buffer utilization for short

packets.

In the following we adopt the non-atomic buffer allocation principles. However,

in any case that atomic buffer allocation is needed the aforementioned rules can be

applied to enforce it.

3.1.3

Hierarchical Switching

Arbitration and multiplexing for reaching the output link can be performed hierar-

chically by merging at each step a group of inputs and allowing one flit from them

to progress to the output. An example of a hierarchical 1-output switch organization

is depicted in Fig. 3.5 a. The main difference of hierarchical switching relative to

single-step switching is that at each step a 2-input arbiter and a 2-to-1 multiplexer

is enough to switch the flits between two inputs, while in the single-step case the

arbiter and the multiplexer employed should have as many inputs as the inputs of

the whole switching module.

To achieve maximum flexibility and increase the throughput of the system by

allowing multiple packets to move in parallel closer to the output, we should modify

also the request generation logic of the baseline design. In the baseline case, every

input before issuing a request to the arbiter qualified its valid signal with the

outAvailable flag of the output and then masked the result with the ready signal

of the output buffer (see Fig. 3.2 ). In the hierarchical implementation this is not

possible since there is no global arbiter to check the requests of all inputs. Instead,

we assume that each merging point can be considered as a partial output and has its

own outAvailable flag. In this way, at each merging point, we can use unchanged the

allocation and multiplexing logic designed for the baseline case (Fig. 3.2 ) including

also the outLock variable at the input of each merging step.

Search WWH ::

Custom Search

Home