Database Reference
In-Depth Information
as desired. Where necessary, you may even have to take manual control over the creation and/or maintenance of
statistics. (I cover this in the “Manual Maintenance” section, and I cover the precise nature of the functions and shape
of statistics in the “Analyzing Statistics” section.) In the following section, I show you why statistics are important to
indexed columns and nonindexed columns functioning as predicates.
Statistics on an Indexed Column
The usefulness of an index is largely dependent on the statistics of the indexed columns; without statistics, SQL
Server's cost-based query optimizer can't decide upon the most effective way of using an index. To meet this
requirement, SQL Server automatically creates the statistics of an index key whenever the index is created. It isn't
possible to turn this feature off.
As data changes, the data retrieval mechanism required to keep the cost of a query low may also change. For
example, if a table has only one matching row for a certain column value, then it makes sense to retrieve the matching
rows from the table by going through the nonclustered index on the column. But if the data in the table changes so
that a large number of rows are added with the same column value, then using the nonclustered index may no longer
make sense. To be able to have SQL Server decide this change in processing strategy as the data changes over time,
it is vital to have up-to-date statistics.
SQL Server can keep the statistics on an index updated as the contents of the indexed column are modified.
By default, this feature is turned on and is configurable through the Properties Options Auto Update Statistics
setting of a database. Updating statistics consumes extra CPU cycles and associated I/O. To optimize the update
process, SQL Server uses an efficient algorithm to decide when to execute the update statistics procedure, based on
factors such as the number of modifications and the size of the table.
When a table with no rows gets a row
When a table has fewer than 500 rows and is increased by 500 or more rows
When a table has more than 500 rows and is increased by 500 rows + 20 percent of the number
of rows
This built-in intelligence keeps the CPU utilization by each process low. It's also possible to update the statistics
asynchronously. This means when a query would normally cause statistics to be updated, instead that query proceeds
with the old statistics, and the statistics are updated offline. This can speed up the response time of some queries,
such as when the database is large or when you have a short timeout period.
When you have large data sets, usually measured in millions of rows or better, you can modify how often statistics
are updated. Instead of a fixed 20 percent for updates, you can get a sliding scale that uses an ever smaller percentage
of changes for an ever greater number of rows. This ensures that you see more frequent statistics updates on large-scale
systems. This functionality requires the modification of the database at a low level using a trace flag. The command
looks like this:
DBCC TRACEON(2371,-1);
Turning on trace flag 2371 will modify the statistics updates from the default described earlier to the sliding
approach.
You can manually disable (or enable) the auto update statistics and the auto update statistics asynchronously
features by using the ALTER DATABASE command. By default, the auto update statistics feature is enabled, and it is
strongly recommended that you keep it enabled. The auto update statistics asynchronously feature is disabled by
default. Turn this feature on only if you've determined it will help with timeouts on your database.
Note
i explain ALTER DATABASE later in this chapter in the “Manual Maintenance” section.
 
 
Search WWH ::




Custom Search