Statistics and Cost Estimation - Inside the SQL Server Query Optimizer

Databases Reference

In-Depth Information

Histograms

In SQL Server, histograms are created only for the first column of a statistics object,

and they compress the information of the distribution of values in those columns by

partitioning that information into subsets called buckets or steps . The maximum number

of steps in a histogram is 200, but even if the input has 200 or more unique values, a

histogram may still have less than 200 steps. To build the histogram, SQL Server finds

the unique values in the column and tries to capture the most frequent ones using a

variation of the maxdiff algorithm, so that the most statistically significant information

is preserved. Maxdiff is one of the available histograms whose purpose is to accurately

represent the distribution of data values in relational databases.

To see how the histogram is used, run the following statement to display the

current statistics of the IX_SalesOrderDetail_ProductID index on the

Sales.SalesOrderDetail table:

DBCC SHOW_STATISTICS ( 'Sales.SalesOrderDetail' , IX_SalesOrderDetail_ProductID )

Listing 3-11.

Both the multi-column index and statistics objects include the columns ProductID ,

SalesOrderID , and SalesOrderDetailID , but since the histogram is only for the

first column, this data is only available for the ProductID column.

Next, I will show you some examples of how the histogram may be used to estimate the

cardinality of some simple predicates. Let's take a look at a section of the histogram, as

shown in the output in Listing 3-12.

Search WWH ::

Custom Search

Home