Database Reference
In-Depth Information
Figure 12-16. Output from missing_column_statistics Extended Events event
The column_list will show which columns did not have statistics, and the sql_text event field will show
which query it is applicable to. You can then decide whether you want to create your own statistics to benefit the
query in question.
Before proceeding, be sure to turn the automatic creation of statistics back on.
ALTER DATABASE AdventureWorks2012 SET AUTO_CREATE_STATISTICS ON;
Analyzing Statistics
Statistics are collections of information defined within three sets of data: the header, the density graph, and the
histograms. One of the most commonly used of these data sets is the histogram. A histogram is a statistical construct
that shows how often data falls into varying categories called steps . The histogram stored by SQL Server consists of a
sampling of data distribution for a column or an index key (or the first column of a multicolumn index key) of up to
200 rows. The information on the range of index key values between two consecutive samples is one step . These steps
consist of varying size intervals between the 200 values stored. A step provides the following information:
The top value of a given step (
RANGE_HI_KEY )
The number of rows equal to
RANGE_HI_KEY (EQ_ROWS)
The number of rows between the previous top value and the current top value, without
counting either of these boundary points ( RANGE_ROWS )
The number of distinct values in the range (
DISTINCT_RANGE_ROWS ); if all values in the range
are unique, then RANGE_ROWS equals DISTINCT_RANGE_ROWS
The average number of rows equal to any potential key value within a range (
AVG_RANGE_ROWS )
For example, when referencing an index, the value of AVG_RANGE_ROWS for a key value within a step in the
histogram helps the optimizer decide how (and whether) to use the index when the indexed column is referred to in a
WHERE clause. Because the optimizer can perform a SEEK or SCAN operation to retrieve rows from a table, the optimizer
can decide which operation to perform based on the number of potential matching rows for the index key value. This
can be even more precise when referencing the RANGE_HI_KEY since the optimizer can know that it should find a fairly
precise number of rows from that value (assuming the statistics are up-to-date).
To understand how the optimizer's data retrieval strategy depends on the number of matching rows, create a test
table with different data distributions on an indexed column.
 
Search WWH ::




Custom Search