Special Indexing and Storage Features - Pro SQL Server Internals

Database Reference

In-Depth Information

At the beginning of the tests, I have created three different heap tables and inserted one million rows into each of

them. After that, I created clustered indexes with different compression settings and FILLFACTOR=100 . This workflow

led to zero index fragmentation and fully populated data pages.

During the first test, I ran SELECT statements to scan all of the clustered indexes accessing some row data. The

second test updated every row in the tables, changing the value of the fixed-length column in a way that did not

increase the row size. The third test inserted another batch of a million rows in the tables. Finally, I rebuilt all of the

clustered indexes.

You can see the execution statistics in Table 4-7 . All tests ran with warm cache with the data pages cached in the

buffer pool. Cold cache could reduce the difference in execution times for the queries against compressed and

non-compressed data, because queries against compressed data perform less physical I/O.

Table 4-7. Data Compression, storage space, and performance

Size

(MB)

Avg. Row Size

(bytes)

SELECT

Elapsed

Time (ms)

UPDATE

Elapsed

Time (ms)

INSERT

Elapsed

Time (ms)

INDEX REBUILD

Elapsed

Time (ms)

No Compression

285

287

298

3,745

12,596

21,537

Row Compression

181

183

224

12,618

17,808

33,074

Page Compression

94

81

267

36,690

39,121

76,694

All statements were forced to run on a single CPU by using a MAXDOP 1 query hint. Using parallelism would

decrease the query execution times, however it would add the overhead of parallelism management during query

execution. We will discuss such overhead later in this topic.

As you can see, data compression improved the performance of the queries that read and scan the data, even

without physical I/O involved. This leads us to conclude that reading compressed data adds very little overhead to

the system. However, compression decreased the performance of data modifications; therefore, it is expensive to

compress data, especially using page compression.

CPU overhead, however, is not the only factor to consider. Compression reduces the size of rows and the number

of data pages required to store them. Indexes use less space in the buffer pool, which allows you to cache more data

in the system. It can significantly reduce the amount of physical I/O and improve system performance due to such

caching. Furthermore, data compression reduces the size of the database and thus the size of backup files and their

storage costs.

Obviously, it is impossible to provide generic advice when it comes to using data compression. In some cases,

especially with heavily CPU-bound servers, compression can degrade system performance. However, in most cases,

compression will benefit the systems. Row compression is usually a good choice when the data is volatile. Page

compression, on the other hand, is better suited for static data. You should analyze each case individually, however,

taking CPU and I/O load, data modification patterns, and various other factors into account.

You should also estimate how much space compression actually saves you. There is no reason to be compressing

the data if the space saving is minimal. Row compression reduces the storage space used by fixed-length data. It does

not help much with variable-length data storage space. The results of page compression depend on the data itself

rather than on data types.

Finally, both data compression methods work with in-row data only, and they will not compress data stored in

row-overflow and LOB pages.

Pro SQL Server Internals

Search WWH ::

Custom Search

Home