Databases Reference
In-Depth Information
Reading Index Statistics
Index statistics are so important to the correct costing of query execution plans that it is worth a second
to refresh your memory on how to read this statistical treasure trove. There are many ways to retrieve
this information but one of the best is to use the DBCC SHOW_STATISTICS command like this:
DBCC SHOW_STATISTICS('Sales.SalesOrderheader', IX_SalesOrderHeader_SalesPersonID)
This produces three result sets of which only partial results are shown in the following explanations, for
brevity's sake. The first result set is at the index level. Here you can find when the statistics were last
updated for the index, the total number of rows, and rows sampled to build the statistics.
Updated
Rows
Rows Sampled
May 31 2007 10:26PM 31465
31465
If the date the statistics were last updated is too old or if the number of rows indicated in the statistics
metadata is not close to the current row count, then it is time to rerun statistics on this table (or better yet,
to enable auto-create statistics to do this for you). If the number of rows sampled is less than 80 percent
of the actual row count of the table, you'll need to increase the probability of accurate plan creation in
the optimizer by running the update statistics command using the FULLSCAN option. This next result set
provides density information at the column level in the index. Density is a measure of the selectivity
of any one value in the column being found in any of the rows sampled. A value of 100 percent would
indicate that all columns have the same value or density. This information is used in the first step of the
optimizer. In the sample below, from the perspective of the optimizer, 5 percent of the table needs to be
read to select any one SalesPersonId.
All density
Average Length Columns
0.05555556
0.4838392
SalesPersonID
3.178134E-05 4.483839
SalesPersonID, SalesOrderID
The next result set has the interesting histogram data. This data is calculated at the same time that the
statistics are updated, so pay attention to how current your statistics are for the best performance. The
histogram is used when the predicates of your T-SQL statement are evaluated to predict row counts
(partial results shown).
RANGE_HI_KEY RANGE_ROWS
EQ_ROWS
DISTINCT_RANGE_ROWS
AVG_RANGE_ROWS
------------ ------------- ------------- -------------------- --------------
268
0
48
0
1
275
0
450
0
1
...
288
0
16
0
1
289
0
130
0
1
290
0
109
0
1
This histrogram is easy to read. The RANGE_HI_KEY column contains actual values of the SalesPer-
sonId column. The EQ_ROWS column counts the number of rows that contain this exact number of
rows. The in-between column RANGE_ROWS is 0. This means that there are 0 rows containing val-
ues between 268 and 275 for SalesPersonId. There are also 0 distinctly different values and only 1
average value between 268 and 275. You can see a more range-based histogram if you look at the index
for CustomerId.
Search WWH ::




Custom Search