Statistics, Data Distribution, and Cardinality - SQL Server Query Performance Tuning

Database Reference

In-Depth Information

The density measurements are interesting. Notice that the density is close to the same for both values, but the

filtered density is slightly lower, meaning fewer unique values. This is because the filtered data, while marginally less

selective, is actually more accurate, eliminating all the empty values that won't contribute to a search. And the density

of the second value, which represents the clustered index pointer, is identical with the value of the density of the

PurchaseOrderNumber alone because each represents the same amount of unique data. The density of the additional

clustered index in the previous column is a much smaller number because of all the unique values of SalesOrderld

that are not included in the filtered data because of the elimination of the null values. You can also see the first

column of the histogram shows a NULL value in Figure 12-23 but has a value in Figure 12-24 .

One other option open to you is to create filtered statistics. This allows you to create even more fine-tuned

histograms. This can be especially useful on partitioned tables. This is necessary because statistics are not

automatically created on partitioned tables and you can't create your own using CREATE STATISTICS . You can create

filtered indexes by partition and get statistics or create filtered statistics specifically by partition.

Before going on, clean the indexes created, if any.

DROP INDEX Sales.SalesOrderHeader.IX_Test;

Cardinality

The statistics, consisting of the histogram and density, are used by the query optimizer to calculate how many rows

are to be expected by each process (called operations ) within the execution of the query. This calculation to determine

the number of rows returned is called the cardinality estimate . Cardinality represents the number of rows in a set

of data, which means it's directly related to the density measures in SQL Server. Starting in SQL Server 2014, a new

cardinality estimator is at work. This is the first change to the core cardinality estimation process since SQL Server 7.0.

The changes to some areas of the estimator means that the optimizer reads from the statistics in the same way as

previously, but the optimizer makes different kinds of calculations to determine the number of rows that are going to

go through each operation in the execution plan depending on the cardinality calculations that have been modified.

Most of the time this data is pulled from the histogram. In the case of a single predicate, the values simply use the

selectivity defined by the histogram. But, when multiple columns are used for filtering, the cardinality calculation has

to take into account the potential selectivity of each column. Prior to SQL Server 2014, there were a couple of simple

calculations used to determine cardinality. For an AND combination, the calculation was based on multiplying the

selectivity of the first column by the selectivity of the second, something like this:

Selectivity1 * Selectivity2 * Selectivity3 ...

An OR calculation between two columns was more complex. The new calculation looks like this:

Selectivity1 * Power(Selectivity2,1/2) * Power(Selectivity3,1/4) ...

In short, instead of simply multiplying the selectivity to make more and more selective data, a softer calculation

is supplied going from the least selective to the most selective data, but arriving at a softer, less skewed estimate by

getting the power of 1/2 the selectivity, and then 1/4, and then 1/8, and so on, depending on how many columns of

data are involved. It won't change all execution plans generated, but the more accurate estimates could change them

in some locations.

In SQL Server 2014, several sets of new calculations are taking place. This means that for most queries, on

average, you may see performance enhancements if your statistics are up-to-date because more accurate cardinality

calculations means the optimizer will make better choices. But, you may also see performance degradation with some

queries because of the changes in the way cardinality is calculated. This is to be expected because of the wide variety

of workloads, schemas, and data distributions that you may encounter.

Search WWH ::

Custom Search

Home