Statistics, Data Distribution, and Cardinality - SQL Server Query Performance Tuning

Database Reference

In-Depth Information

as desired. Where necessary, you may even have to take manual control over the creation and/or maintenance of

statistics. (I cover this in the “Manual Maintenance” section, and I cover the precise nature of the functions and shape

of statistics in the “Analyzing Statistics” section.) In the following section, I show you why statistics are important to

indexed columns and nonindexed columns functioning as predicates.

Statistics on an Indexed Column

The usefulness of an index is largely dependent on the statistics of the indexed columns; without statistics, SQL

Server's cost-based query optimizer can't decide upon the most effective way of using an index. To meet this

requirement, SQL Server automatically creates the statistics of an index key whenever the index is created. It isn't

possible to turn this feature off.

As data changes, the data retrieval mechanism required to keep the cost of a query low may also change. For

example, if a table has only one matching row for a certain column value, then it makes sense to retrieve the matching

rows from the table by going through the nonclustered index on the column. But if the data in the table changes so

that a large number of rows are added with the same column value, then using the nonclustered index may no longer

make sense. To be able to have SQL Server decide this change in processing strategy as the data changes over time,

it is vital to have up-to-date statistics.

SQL Server can keep the statistics on an index updated as the contents of the indexed column are modified.

By default, this feature is turned on and is configurable through the Properties ➤ Options ➤ Auto Update Statistics

setting of a database. Updating statistics consumes extra CPU cycles and associated I/O. To optimize the update

process, SQL Server uses an efficient algorithm to decide when to execute the update statistics procedure, based on

factors such as the number of modifications and the size of the table.

•

When a table with no rows gets a row

•

When a table has fewer than 500 rows and is increased by 500 or more rows

•

When a table has more than 500 rows and is increased by 500 rows + 20 percent of the number

of rows

This built-in intelligence keeps the CPU utilization by each process low. It's also possible to update the statistics

asynchronously. This means when a query would normally cause statistics to be updated, instead that query proceeds

with the old statistics, and the statistics are updated offline. This can speed up the response time of some queries,

such as when the database is large or when you have a short timeout period.

When you have large data sets, usually measured in millions of rows or better, you can modify how often statistics

are updated. Instead of a fixed 20 percent for updates, you can get a sliding scale that uses an ever smaller percentage

of changes for an ever greater number of rows. This ensures that you see more frequent statistics updates on large-scale

systems. This functionality requires the modification of the database at a low level using a trace flag. The command

looks like this:

DBCC TRACEON(2371,-1);

Turning on trace flag 2371 will modify the statistics updates from the default described earlier to the sliding

approach.

You can manually disable (or enable) the auto update statistics and the auto update statistics asynchronously

features by using the ALTER DATABASE command. By default, the auto update statistics feature is enabled, and it is

strongly recommended that you keep it enabled. The auto update statistics asynchronously feature is disabled by

default. Turn this feature on only if you've determined it will help with timeouts on your database.

■

Note

i explain ALTER DATABASE later in this chapter in the “Manual Maintenance” section.

SQL Server Query Performance Tuning

Search WWH ::

Custom Search

Home