Tuning T-SQL - Professional SQL Server 2005 Performance Tuning

Databases Reference

In-Depth Information

Reading Index Statistics

Index statistics are so important to the correct costing of query execution plans that it is worth a second

to refresh your memory on how to read this statistical treasure trove. There are many ways to retrieve

this information but one of the best is to use the DBCC SHOW_STATISTICS command like this:

DBCC SHOW_STATISTICS('Sales.SalesOrderheader', IX_SalesOrderHeader_SalesPersonID)

This produces three result sets of which only partial results are shown in the following explanations, for

brevity's sake. The first result set is at the index level. Here you can find when the statistics were last

updated for the index, the total number of rows, and rows sampled to build the statistics.

Updated

Rows

Rows Sampled

May 31 2007 10:26PM 31465

31465

If the date the statistics were last updated is too old or if the number of rows indicated in the statistics

metadata is not close to the current row count, then it is time to rerun statistics on this table (or better yet,

to enable auto-create statistics to do this for you). If the number of rows sampled is less than 80 percent

of the actual row count of the table, you'll need to increase the probability of accurate plan creation in

the optimizer by running the update statistics command using the FULLSCAN option. This next result set

provides density information at the column level in the index. Density is a measure of the selectivity

of any one value in the column being found in any of the rows sampled. A value of 100 percent would

indicate that all columns have the same value or density. This information is used in the first step of the

optimizer. In the sample below, from the perspective of the optimizer, 5 percent of the table needs to be

read to select any one SalesPersonId.

All density

Average Length Columns

0.05555556

0.4838392

SalesPersonID

3.178134E-05 4.483839

SalesPersonID, SalesOrderID

The next result set has the interesting histogram data. This data is calculated at the same time that the

statistics are updated, so pay attention to how current your statistics are for the best performance. The

histogram is used when the predicates of your T-SQL statement are evaluated to predict row counts

(partial results shown).

RANGE_HI_KEY RANGE_ROWS

EQ_ROWS

DISTINCT_RANGE_ROWS

AVG_RANGE_ROWS

------------ ------------- ------------- -------------------- --------------

268

0

48

0

1

275

0

450

0

1

...

288

0

16

0

1

289

0

130

0

1

290

0

109

0

1

This histrogram is easy to read. The RANGE_HI_KEY column contains actual values of the SalesPer-

sonId column. The EQ_ROWS column counts the number of rows that contain this exact number of

rows. The in-between column RANGE_ROWS is 0. This means that there are 0 rows containing val-

ues between 268 and 275 for SalesPersonId. There are also 0 distinctly different values and only 1

average value between 268 and 275. You can see a more range-based histogram if you look at the index

for CustomerId.

Professional SQL Server 2005 Performance Tuning

Search WWH ::

Custom Search

Home