Database Reference
In-Depth Information
At the beginning of the tests, I have created three different heap tables and inserted one million rows into each of
them. After that, I created clustered indexes with different compression settings and FILLFACTOR=100 . This workflow
led to zero index fragmentation and fully populated data pages.
During the first test, I ran SELECT statements to scan all of the clustered indexes accessing some row data. The
second test updated every row in the tables, changing the value of the fixed-length column in a way that did not
increase the row size. The third test inserted another batch of a million rows in the tables. Finally, I rebuilt all of the
clustered indexes.
You can see the execution statistics in Table 4-7 . All tests ran with warm cache with the data pages cached in the
buffer pool. Cold cache could reduce the difference in execution times for the queries against compressed and
non-compressed data, because queries against compressed data perform less physical I/O.
Table 4-7. Data Compression, storage space, and performance
Size
(MB)
Avg. Row Size
(bytes)
SELECT
Elapsed
Time (ms)
UPDATE
Elapsed
Time (ms)
INSERT
Elapsed
Time (ms)
INDEX REBUILD
Elapsed
Time (ms)
No Compression
285
287
298
3,745
12,596
21,537
Row Compression
181
183
224
12,618
17,808
33,074
Page Compression
94
81
267
36,690
39,121
76,694
All statements were forced to run on a single CPU by using a MAXDOP 1 query hint. Using parallelism would
decrease the query execution times, however it would add the overhead of parallelism management during query
execution. We will discuss such overhead later in this topic.
As you can see, data compression improved the performance of the queries that read and scan the data, even
without physical I/O involved. This leads us to conclude that reading compressed data adds very little overhead to
the system. However, compression decreased the performance of data modifications; therefore, it is expensive to
compress data, especially using page compression.
CPU overhead, however, is not the only factor to consider. Compression reduces the size of rows and the number
of data pages required to store them. Indexes use less space in the buffer pool, which allows you to cache more data
in the system. It can significantly reduce the amount of physical I/O and improve system performance due to such
caching. Furthermore, data compression reduces the size of the database and thus the size of backup files and their
storage costs.
Obviously, it is impossible to provide generic advice when it comes to using data compression. In some cases,
especially with heavily CPU-bound servers, compression can degrade system performance. However, in most cases,
compression will benefit the systems. Row compression is usually a good choice when the data is volatile. Page
compression, on the other hand, is better suited for static data. You should analyze each case individually, however,
taking CPU and I/O load, data modification patterns, and various other factors into account.
You should also estimate how much space compression actually saves you. There is no reason to be compressing
the data if the space saving is minimal. Row compression reduces the storage space used by fixed-length data. It does
not help much with variable-length data storage space. The results of page compression depend on the data itself
rather than on data types.
Finally, both data compression methods work with in-row data only, and they will not compress data stored in
row-overflow and LOB pages.
 
 
Search WWH ::




Custom Search