Data Compression (Networking)

Data compression has become a standard feature of most bridges and routers, as well as modems. In its simplest implementation, compression capitalizes on the redundancies found in the data. The algorithm detects repeating characters or strings of characters and represents them as a symbol or token. At the receiving end, the process works in reverse to restore the original data.

Compression Efficiency

Compression efficiency tends to differ by application. The compression ratio can be as high as 6-to-1 when the traffic consists of heavy-duty file transfers. The compression ratio is less that 4-to-1 when the traffic is mostly database queries. When there are only “keep-alive” signals or sporadic query traffic on a T1 line, the compression ratio can dip below 2-to-1. Encrypted data exhibits little or no compression because the encryption process expands the data and uses more bandwidth. However, if data expansion is detected and compression is withheld until the encrypted data is completely transmitted, the need for more bandwidth can be avoided.

The use of data compression is particularly advantageous in the following situations:

■ When data traffic is increasing due to the addition or expansion of LANs and associated data-intensive, bursty traffic

■ When LAN and legacy traffic must contend for the same limited bandwidth

■ When reducing or limiting the number of 56/64-Kbps lines is desirable to reduce operational costs

■ When lowering the Committed Information Rate (CIR) for Frame Relay services or sending fewer packets over an X.25 network can result in substantial cost savings

The greatest cost savings from data compression most often occurs at remote sites, where bandwidth is typically in short supply. Data compression can extend the life of 56/64-Kbps leased lines, thus avoiding the need for more expensive fractional T1 lines or N X 64 services. Depending on the application, a 56/64-Kbps leased line can deliver 112 Kbps to 256 Kbps or higher throughput when data compression is applied.

History

Symplex Communications Corp. pioneered data compression for bridges and routers with the introduction of its Datamizer I in 1983, which offered 2-to-1 compression to achieve 19.2-Kbps throughput on 9.6-Kbps lines. Datamizer IV was the first device to provide 4-to-1 compression over leased lines and could be configured to activate additional lines as traffic volume increased. It also could be configured to automatically reroute data if a line failed.

The company’s Datamizer V is adept at handling a combination of hard- and easy-to-compress data types in that it automatically adjusts compression techniques to maximize the throughput benefit on a per packet basis. If users experience compression of less than 2-to-1, the Datamizer V’s default setting for easy-to-compress data can be changed to the hard-to-compress option to achieve better performance. The newer multi-port Datamizer 6 supports higher speeds, provides additional throughput performance, and offers optional encryption features. It operates seamlessly on virtually all forms of telecom services, including frame relay, dedicated lines, satellite links, and ISDN

Of course, there are other products that implement data compression on WAN links. Fourelle Systems, for example, offers an innovative hardware and firmware solution called Venturi that increases overall bandwidth for both wired and wireless TCP/IP networks. In a typical installation, a standard client application (e.g., Web browser) communicates through a local Venturi proxy, which in turn transmits data across a bandwidth-constrained link to a Venturi compression server proxy Venturi server then communicates to a network-based application (e.g., Web server). Each proxy communicates to its respective application using standard TCP/IP protocols, requiring no change to the application. Using an optimized IP-based transport, Venturi combines several data-dependent compression techniques, resulting in a 50-percent to 99-percent reduction in the amount of data transmitted.

Types of Data Compression

There are several different data compression methods in use today over wide area networks—among them are TCP/IP header compression, link compression and multi-channel payload compression. Depending on the method, there can be a significant tradeoff between lower bandwidth consumption and increased packet delay.

TCP/IP HEADER COMPRESSION With TCP/IP header compression, the packet headers are compressed but the data payload remains unchanged. Since the TCP/IP header must be replaced at each node for IP routing to be possible, this compression method requires hop-by-hop compression and decompression processing. This adds delay to each compressed/decompressed packet and puts an added burden on the router’s CPU.

LINK COMPRESSION With link compression, the entire frame-both protocol header and payload—are compressed. This form of compression is typically used in LAN-only or legacy-only environments. However, this method requires error correction and packet sequencing software, which adds to the processing overhead already introduced by link compression and results in increased packet delays. Also, like TCP/IP header compression, link compression requires hop-to-hop compression and decompression, so processor loading and packet delays occur at each router node the data traverses.

With link compression, a single data compression vocabulary dictionary or history buffer is maintained for all virtual circuits compressed over the WAN link. This buffer holds a running history about what data has been transmitted to help make future transmissions more efficient. To obtain optimal compression ratios, the history buffer must be large, requiring a significant amount of memory. The vocabulary dictionary resets at the end of each frame. This technique offers lower compression ratios than multi-channel, multi-history buffer (vocabularies) data compression methods. This is particularly true when transmitting mixed LAN and serial protocol traffic over the WAN link and frame sizes are 2K bytes or less. This translates into higher costs, but if more memory is added to get better ratios, this increases the up-front cost of the solution.

MIXED-CHANNEL PAYLOAD DATA COMPRESSION By using separate history buffers or vocabularies for each virtual circuit, multichannel payload data compression can yield higher compression ratios that require much less memory than other data compression methods. This is particularly true in cases where mixed LAN and serial protocol traffic traverses the network. Higher compression ratios translate into lower WAN bandwidth requirements and greater cost savings.

But performance varies because vendors define payload data compression differently. Some consider it to be compression of everything that follows the IP header. However, the IP header can be a significant number of bytes. For overall compression to be effective, header compression must be applied. This adds to the processing burden of the CPU and increases packet delays.

External Data Compression Solutions

Although bridges and routers can perform data compression, external compression devices are often required to connect to higher speed links. The reason is that data compression is extremely processor intensive, with multichannel payload data compression being the most burdensome. The faster the packets must move through the router, the more difficult it is for the router’s processor to keep up.

The advantages of internal data compression engines are that they can provide multi-channel compression, lower cost, and simplified management. However, by using a separate internal digital signal processor (DSP) for data compression, instead of the software-only approach—all of the other basic functions within the router can continue to be processed simultaneously. This parallel processing approach minimizes the packet delay that can occur when the router’s CPU is forced to handle all these tasks itself.

Last Word

Data compression will become increasingly important to most organizations as the volume of data traffic at branch locations begins to exceed the capacity of the wide area links. Multi-channel payload solutions provide the highest compression ratios and reduce the number of packets transmitted across the network. Reducing packet latency can be effectively achieved via a dedicated processor like a DSP and by employing end-to-end compression techniques, rather than node-to-node compression/decompression. All of these factors contribute to reducing WAN circuit and equipment costs as well as improving the network response time and availability for user applications.

Next post:

Previous post: