Database Reference
In-Depth Information
Data Compression
The Enterprise Edition of SQL Server 2008 and above allows you to reduce the size of tables by implementing data
compression. There are two types of data compression available: row and page. Row compression reduces the size
of rows by using a different row format, which eliminates the unused storage space from fixed-length data. Page
compression works on the data page scope, and it removes duplicated byte sequences from a page.
Data pages in the buffer pool store data in the same format as data is stored on disk. Compression is
implemented transparently to other SQL Server features; that is, SQL Server components that access data do not know
if compression is used or not.
Even though data compression allows you to put more rows into data pages, it does not increase the amount of
data that a row can store. The 8,060 bytes maximum row size limitation still applies, regardless of the compression
settings. SQL Server guarantees that the disabling of data compression will always succeed and, therefore, an
uncompressed row must always fit on a data page.
Let's examine how both compression types are implemented.
Row Compression
As you will remember, the regular row format, called FixedVar , stores fixed- and variable-length data in different
sections of the row. The benefit of such an approach is fast access to column data. Fixed-length columns always have
the same in-row offset. The offset of variable-length column data can also be easily obtained based on the offset array
information.
This performance, however, comes at a cost. Fixed-length columns always use the same storage space based on
the largest value of the data type. For example, the int data type always uses four bytes, even when it stored 0 or
NULL values.
Unfortunately, unused space quickly adds up. One unused byte leads to almost 350MB of unused space per
year in a table that collects one million rows per day. The table uses more space on disk and in the buffer pool, which
increases the number of required I/O operations and negatively affects the performance of the system.
Row compression addresses this problem by implementing another row format, called CD , which stands for
Column Descriptor .” With this format, every row stores the column and data description information in the row using
the exact amount of storage space required for the value. Figure 4-15 illustrates the CD row format.
Figure 4-15. CD row format
Similar to the FixedVar row format, data in the CD format is separated into two different sections: Short Data
Region and Long Data Region . However, the separation is based on the size of the data rather than on the data type.
The Short Data Region stores data up to 8 bytes in size. Larger values are stored in the Long Data Region. Let's look at
the row format in depth.
 
Search WWH ::




Custom Search