Database Reference
In-Depth Information
Density
When creating an execution plan, the query optimizer analyzes the statistics of the columns used in the filter and JOIN
clauses. A filter criterion with high selectivity limits the number of rows from a table to a small result set and helps
the optimizer keep the query cost low. A column with a unique index will have a high selectivity since it can limit the
number of matching rows to one.
On the other hand, a filter criterion with low selectivity will return a large result set from the table. A filter
criterion with low selectivity makes a nonclustered index on the column ineffective. Navigating through a
nonclustered index to the base table for a large result set is usually costlier than scanning the base table (or clustered
index) directly because of the cost overhead of lookups associated with the nonclustered index. You can observe this
behavior in the execution plan in Figure 12-18 .
Statistics track the selectivity of a column in the form of a density ratio. A column with high selectivity
(or uniqueness) will have low density. A column with low density (that is, high selectivity) is suitable for a filtering
criteria because it can help the optimizer retrieve a small number of rows very fast. This is also the principle on which
filtered indexes operate since the filter's goal is to increase the selectivity, or density, of the index.
Density can be expressed as follows:
Density = 1 / Number of distinct values for a column
Density will always come out as a number somewhere between 0 and 1. The lower the column density, the more
suitable it is for use as an index key. You can perform your own calculations to determine the density of columns
within your own indexes and statistics. For example, to calculate the density of column c1 from the test table built by
create_t3.sql , use the following (results in Figure 12-19 ):
SELECT 1.0 / COUNT(DISTINCT C1)
FROM dbo.Test1;
Figure 12-19. Results of density calculation for column C1
You can see this as actual data in the All density column in the output from DBCC SHOW_ STATISTICS . This
high-density value for the column makes it a less suitable candidate for an index, even a filtered index. However, the
statistics of the index key values maintained in the steps help the query optimizer use the index for the predicate
C1 = 1 , as shown in the previous execution plan.
Statistics on a Multicolumn Index
In the case of an index with one column, statistics consist of a histogram and a density value for that column. Statistics
for a composite index with multiple columns consist of one histogram for the first column only and multiple density
values. This is one reason why it's generally a good practice to put the more selective column, the one with the lowest
density, first when building a compound index or compound statistics. The density values include the density for the
first column and for each prefix combination of the index key columns. Multiple density values help the optimizer
find the selectivity of the composite index when multiple columns are referred to by predicates in the WHERE and JOIN
clauses. Although the first column can help determine the histogram, the final density of the column itself would be
the same regardless of column order.
 
Search WWH ::




Custom Search