Link Efficiency Mechanisms (Congestion Avoidance, Policing, Shaping, and Link Efficiency Mechanisms)

The main link efficiency mechanisms deployed today are compression- and fragmentation-based. There are several types of compression: link compression, layer 2 payload compression, RTP header compression, and TCP header compression. Fragmentation is usually combined with interleaving. Compression makes link utilization more efficient, and it is a QoS technique that actually makes more bandwidth available. Fragmentation aims at reducing the expected delay of packets by reducing the maximum packet size over a circuit or connection. Compression is a technique used in many of the link efficiency mechanisms. Compression reduces the size of data to be transferred; therefore, it increases throughput and reduces overall delay. Many compression algorithms have been developed over time. An example for a compression algorithm is Lempel-Ziv (LZ) used by Stacker compression. Most compression algorithms take advantage of and remove the repeated patterns and redundancy in data. One main difference between compression algorithms is the type of data the algorithm has been optimized for. For example, MPEG has been developed for and works well for compressing video, whereas the Huffman algorithm compresses text-based data well.

The success of compression algorithms is measured and expressed by the ratio of raw data to compressed data; a ratio of 2:1 is common. According to Shannon’s theorem, compression has a theoretical limit; it is believed that algorithms of today that run on high-end CPUs can reach the highest possible compression levels. If compression is hardware based, the main CPU cycles are not used; on the other hand, if compression is software based, the main CPU is interrupted and its cycles are used for performing compression. For that reason, when possible, hardware compression is recommended over software compression. Some compression options are Layer 2 payload compression and upper layer (Layer 3 and 4) header compression.


Layer 2 Payload Compression

Layer 2 payload compression, as the name implies, compresses the entire payload of a Layer 2 frame. For example, if a Layer 2 frame encapsulates an IP packet, the entire IP packet is compressed. Layer 2 payload compression is performed on a link-by-link basis; it can be performed on WAN connections such as PPP, Frame Relay, high-level data link control (HDLC), X.25, and Link Access Procedure, Balanced (LAPB). Cisco IOS supports Stacker, Predictor, and Microsoft Point-to-Point Compression (MPPC) as Layer 2 compression methods. The primary difference between these methods is their overhead and utilization of CPU and memory.

Because Layer 2 payload compression reduces the size of the frame, serialization delay is reduced. Increase in available bandwidth (hence throughput) depends on the algorithm efficiency. Depending on the complexity of the compression algorithm and whether the compression is software based or hardware based (or hardware assisted), compression introduces some amount of delay. However, overall delay (latency) is reduced, especially on low-speed links, whenever Layer 2 compression is used. Layer 2 payload compression is useful over circuits or connections that require the Layer 2 headers to remain in tact. For example, over a Frame Relay or ATM circuit you can use Layer 2 payload compression. Link compression, on the other hand, compresses the entire Layer 2 data unit including its header, which won’t work over Frame Relay and ATM, but would work well over PPP or HDLC connections.

Figure 5-8 shows three cases, the first of which makes no use of payload compression. The second and third scenarios in Figure 5-8 use software-based and hardware-based Layer 2 payload compression, respectively. Hardware compression and hardware-assisted compression are recommended, because they are more CPU efficient than software-based compression. The throughput gain that a Layer 2 payload compression algorithm yields depends on the algorithm itself and has no dependency on whether it is software or hardware based. In Figure 5-8, hardware compression resulted in the least overall delay, but its software compression counterpart yielded better throughput results.

Figure 5-8 Layer 2 Payload Compression Options and Results

Layer 2 Payload Compression Options and Results

Header Compression

Header compression reduces serialization delay and results in less bandwidth usage, yielding more throughput and more available bandwidth. As the name implies, header compression compresses headers only; for example, RTP header compression compresses Real-time Transport Protocol (RTP), User Datagram Protocol (UDP), and IP headers, but it does not compress the application data. This makes header compression especially useful for cases in which application payload size is small. Without header compression, the header (overhead)-to-payload (data) ratio is large, but with header compression, the overhead-to-data ratio reduces significantly.

Common header compression options such as TCP header compression and RTP header compression use a simple yet effective technique. Because the headers of the packets in a single flow are identical (some exceptions may apply), instead of sending the identical (and relatively large) header with every packet, a number or index that refers to that entire header is sent instead. This technique is based on a dictionary style of compression algorithms, in which phrases or blocks of data are replaced with a short reference to that phrase or block of data. The receiving end, based on the reference number or index, places the real header back on the packet and forwards it.

When you enable TCP or RTP header compression on a link, all TCP or RTP flows are header-compressed as a result. First, note that this is done on a link-by-link basis. Second, note that you cannot enable the feature on a subset of sessions or flows. If you plan to perform header compression on specific packet types or applications, what you need to do is class-based header compression. Class-based header compression is performed by applying appropriate IOS commands to the desired classes within a policy map using MQC.

Header compression is not CPU-intensive; therefore, the extra delay introduced due to header compression is negligible. Assume that a 512-Kbps link exists between two routers, similar to the one shown in Figure 5-9. In the first case shown in Figure 5-9, header compression is not used, and the forwarding delay of 1 ms and data propagation delay of 8 ms yield a total delay of 9 ms between the two routers shown. With no compression performed, the link throughput is the same as the link bandwidth, which is 512 Kbps. In Figure 5-9, the link where header compression is performed shows a processing delay of 2 ms but a data propagation delay of only 4 ms, yielding a total delay of 6 ms. Furthermore, the link where header compression is performed provides more throughput, in this case a total throughput of 716 Kbps.

Figure 5-9 Header Compression Results

Header Compression Results

If the links shown in Figure 5-9 are configured as PPP links and RTP packets carry 20-byte voice payloads through them, the header (overhead) to payload (data) ratio can be reduced significantly with RTP header compression. Since a PPP header is 6 bytes long and RTP/UDP/IP headers add up to 40 bytes, the header to payload ratio is (40 + 6) / 20, which equals 230 percent without RTP header compression. On the other hand, with RTP header compression, if the no checksum option is used, the RTP/UDP/IP header is reduced to 2 bytes only; the header (overhead)-to-payload (data) ratio in this case reduces to (2 + 6) / 20, which equals 40 percent.

Link Fragmentation and Interleaving

When an interface is congested, packets first go through the software queue and then are forwarded to the hardware queue; when the interface has no congestion, packets skip the software queue and go straight to the hardware queue. You can use advanced queuing methods such as LLQ to minimize the software queuing delay that delay-sensitive packets such as VoIP experience.

Packets must always go through the hardware queue, which is FIFO based. If a VoIP packet ends up behind one or more large packets in the hardware queue, it might experience too much delay in that area and end up going over its total end-to-end delay budget. The goal for the end-to-end delay budget of a VoIP packet (one-way) is 150 ms to 200 ms.

Imagine that a VoIP packet ends up in a Tx (HW) queue behind a 1500-byte frame that has to be transmitted by the interface hardware out of a 256-Kbps link. The amount of time that the VoIP packet has to wait for transmission of the 1500-byte frame ahead of it is 47 ms (1500 (bytes) x 8 (bits/byte) / 256000 (bits/sec)). Typically, during the design phase, a 10- to 15-ms delay budget is allocated to serialization on slow links. This example clearly demonstrates that with the presence of large data units, the delay will go much beyond the normally expected value.

It is possible to mitigate the delay imposed by the large data units ahead of VoIP (or other delay-sensitive packets) in the hardware (Tx) queue. The solution is fragmentation and interleaving (LFI). You must enable fragmentation on a link and specify the maximum data unit size (called fragment size). Fragmentation must be accompanied by interleaving; otherwise, fragmentation will have no effect. Interleaving allows packets of different flows to get between fragments of large data units in the queue.

Applying Link Efficiency Mechanisms

Link efficiency mechanisms discussed in this section might not be necessary on all interfaces and links. It is important that you identify network bottlenecks and work on the problem spots. On fast links, many link efficiency mechanisms are not supported, and if they are, they might have negative results. On slow links and where bottlenecks are recognized, you must calculate the overhead-to-data ratios and consider all compression options. On some links, you can perform full link compression. On some, you can perform Layer 2 payload compression, and on others, you will probably perform header compression such as RTP or TCP header compression only. Link fragmentation and interleaving is always a good option to consider on slow links. It is noteworthy that compounding compression methods usually has an adverse affect and slows down throughput.

At the WAN edge on WAN links with equal or less bandwidth than T1/E1, it is recommended to enable both TCP/RTP header compression and LFI. These improve WAN link utilization and reduce the serialization delay. Because Layer 2 payload compression is CPU-intensive, it is recommended only if it can be hardware-based or hardware-assisted.

Foundation Summary

The "Foundation Summary" is a collection of information that provides a convenient review of many key concepts in this topic. If you are already comfortable with the topics in this topic, this summary can help you recall a few details. If you just read this topic, this review should help solidify some key facts. If you are doing your final preparation before the exam, the information in this section is a convenient way to review the day before the exam.

Tail drop is the default queuing response to congestion. It has three significant drawbacks:

■ TCP synchronization—Packet drops from many sessions at the same time cause TCP sessions to slow down (decrease send windows size) and speed up (increase send window) at the same time, causing inefficient link utilization.

■ TCP starvation—Aggressive and non-TCP flows might fill up the queues, leaving little or no room for other less aggressive applications and TCP packets (specially after slowdown).

■ No differentiated drop—Tail drop does not punish aggressive flows in particular, and it does not differentiate between high- and low-priority packets.

RED avoids tail drop by randomly dropping packets when the queue size is above a min-threshold value, and it increases drop rate as the average queue size increases. RED has the following benefits:

■ Only the TCP sessions whose packets are dropped slow down.

■ The average queue size is kept small, reducing the chances of tail drop.

■ Link utilization becomes higher and more efficient.

RED, or RED profile, is configured using three parameters: minimum threshold, maximum threshold, and mark probability denominator. When the average queue size is below the minimum threshold, RED is in the no-drop mode. When the average queue size is between the minimum threshold and the maximum threshold, RED is in random-drop mode. When the average queue size is above the maximum threshold, RED is in full-drop mode.

WRED can use multiple profiles based on IP precedence (up to 8 profiles) or DSCP (up to 64 profiles). Using profiles, WRED can drop less important packets more aggressively than more important packets. You can apply WRED to an interface, a virtual circuit (VC), or a class within a policy map. The last case is called WRED, or CBWRED. CBWRED is configured in combination with CBWFQ.

Traffic-shaping and policing are traffic-conditioning tools. These mechanisms classify packets and measure traffic rates. Shaping queues excess packets to stay within a desired rate, whereas policing either re-marks or drops excess packets to keep them within a rate limit.

Policing is used to do the following:

■ Limit access to resources when high-speed access is used but not desired (subrate access)

■ Limit the traffic rate of certain applications or traffic classes

■ Mark down (recolor) exceeding traffic at Layer 2 or Layer 3

Shaping is used to do the following:

■ Prevent and manage congestion in ATM, Frame Relay, and Metro Ethernet networks, where asymmetric bandwidths are used along the traffic path.

■ Regulate the sending traffic rate to match the subscribed (committed) rate in ATM, Frame Relay, or Metro Ethernet networks.

Following are the similarities and differences between policing and shaping:

■ Both traffic shaping and traffic policing measure traffic. (Sometimes, different traffic classes are measured separately.)

■ Policing can be applied to the inbound and outbound traffic (with respect to an interface), but traffic shaping applies only to outbound traffic.

■ Shaping buffers excess traffic and sends it according to a preconfigured rate, whereas policing drops or re-marks excess traffic.

■ Shaping requires memory for buffering excess traffic, which creates variable delay and jitter; policing does not require extra memory, and it does not impose variable delay.

■ Policing can re-mark traffic, but traffic shaping does not re-mark traffic.

■ Traffic shaping can be configured to shape traffic based on network conditions and signals, but policing does not respond to network conditions and signals.

The operating systems on Cisco devices measure traffic rates using a bucket and token scheme.

The important points to remember about a token bucket scheme are these:

■ To be able to transmit 1 byte of data, the bucket must have one token.

■ If the size of data to be transmitted (in bytes) is smaller than the number of tokens, the traffic is called conforming. When traffic conforms, as many tokens as the size of data are removed from the bucket, and the conform action, which is usually forward data, is performed.

■ If the size of data to be transmitted (in bytes) is larger than the number of tokens, the traffic is called exceeding. In the exceed situation, tokens are not removed from the bucket, but the action performed (exceed action) is either buffer and send data later (in case of shaping), or it is drop or mark data (in the case of policing).

Similarities and differences between class-based shaping and FRTS are as follows:

■ FRTS controls Frame Relay traffic only and can be applied to a Frame Relay subinterface or Frame Relay DLCI.

■ Whereas Frame Relay traffic shaping supports Frame Relay fragmentation and interleaving (FRF.12), class-based traffic shaping does not.

■ Both class-based traffic shaping and FRTS interact with and support Frame Relay network congestion signals such as BECN and FECN.

■ A router that is receiving BECNs shapes its outgoing Frame Relay traffic to a lower rate, and if it receives FECNs, even if it has no traffic for the other end, it sends test frames with the BECN bit set to inform the other end to slow down.

Compression identifies patterns in data and removes redundancy as much as possible. It increases throughput and decreases latency. Many compression algorithms exist for different types of data. Hardware compression (or hardware-assisted) is preferred over software compression because it does not use main CPU cycles. Payload compression reduces the size of the payload. Header compression reduces the header overhead.

Link efficiency mechanisms are often deployed on WAN links to increase the throughput and decrease the delay and jitter. Cisco IOS link efficiency mechanisms include the following:

■ Layer 2 payload compression (Stacker, Predictor, MPPC)

■ Header compression (TCP, RTP, and class-based)

■ Link Fragmentation and Interleaving (LFI)

Next post:

Previous post: