Database Reference
In-Depth Information
Figure 12-16.
Output from missing_column_statistics Extended Events event
The
column_list
will show which columns did not have statistics, and the
sql_text
event field will show
which query it is applicable to. You can then decide whether you want to create your own statistics to benefit the
query in question.
Before proceeding, be sure to turn the automatic creation of statistics back on.
ALTER DATABASE AdventureWorks2012 SET AUTO_CREATE_STATISTICS ON;
Analyzing Statistics
Statistics are collections of information defined within three sets of data: the header, the density graph, and the
histograms. One of the most commonly used of these data sets is the histogram. A
histogram
is a statistical construct
that shows how often data falls into varying categories called
steps
. The histogram stored by SQL Server consists of a
sampling of data distribution for a column or an index key (or the first column of a multicolumn index key) of up to
200 rows. The information on the range of index key values between two consecutive samples is one step
.
These steps
consist of varying size intervals between the 200 values stored. A step provides the following information:
•
The top value of a given step (
RANGE_HI_KEY
)
•
The number of rows equal to
RANGE_HI_KEY (EQ_ROWS)
The number of rows between the previous top value and the current top value, without
•
counting either of these boundary points (
RANGE_ROWS
)
The number of distinct values in the range (
•
DISTINCT_RANGE_ROWS
); if all values in the range
are unique, then
RANGE_ROWS
equals
DISTINCT_RANGE_ROWS
The average number of rows equal to any potential key value within a range (
AVG_RANGE_ROWS
)
For example, when referencing an index, the value of
AVG_RANGE_ROWS
for a key value within a step in the
histogram helps the optimizer decide how (and whether) to use the index when the indexed column is referred to in a
WHERE
clause. Because the optimizer can perform a
SEEK
or
SCAN
operation to retrieve rows from a table, the optimizer
can decide which operation to perform based on the number of potential matching rows for the index key value. This
can be even more precise when referencing the
RANGE_HI_KEY
since the optimizer can know that it should find a fairly
precise number of rows from that value (assuming the statistics are up-to-date).
To understand how the optimizer's data retrieval strategy depends on the number of matching rows, create a test
table with different data distributions on an indexed column.
•