Database Reference
In-Depth Information
The density measurements are interesting. Notice that the density is close to the same for both values, but the
filtered density is slightly lower, meaning fewer unique values. This is because the filtered data, while marginally less
selective, is actually more accurate, eliminating all the empty values that won't contribute to a search. And the density
of the second value, which represents the clustered index pointer, is identical with the value of the density of the
PurchaseOrderNumber alone because each represents the same amount of unique data. The density of the additional
clustered index in the previous column is a much smaller number because of all the unique values of SalesOrderld
that are not included in the filtered data because of the elimination of the null values. You can also see the first
column of the histogram shows a NULL value in Figure 12-23 but has a value in Figure 12-24 .
One other option open to you is to create filtered statistics. This allows you to create even more fine-tuned
histograms. This can be especially useful on partitioned tables. This is necessary because statistics are not
automatically created on partitioned tables and you can't create your own using CREATE STATISTICS . You can create
filtered indexes by partition and get statistics or create filtered statistics specifically by partition.
Before going on, clean the indexes created, if any.
DROP INDEX Sales.SalesOrderHeader.IX_Test;
Cardinality
The statistics, consisting of the histogram and density, are used by the query optimizer to calculate how many rows
are to be expected by each process (called operations ) within the execution of the query. This calculation to determine
the number of rows returned is called the cardinality estimate . Cardinality represents the number of rows in a set
of data, which means it's directly related to the density measures in SQL Server. Starting in SQL Server 2014, a new
cardinality estimator is at work. This is the first change to the core cardinality estimation process since SQL Server 7.0.
The changes to some areas of the estimator means that the optimizer reads from the statistics in the same way as
previously, but the optimizer makes different kinds of calculations to determine the number of rows that are going to
go through each operation in the execution plan depending on the cardinality calculations that have been modified.
Most of the time this data is pulled from the histogram. In the case of a single predicate, the values simply use the
selectivity defined by the histogram. But, when multiple columns are used for filtering, the cardinality calculation has
to take into account the potential selectivity of each column. Prior to SQL Server 2014, there were a couple of simple
calculations used to determine cardinality. For an AND combination, the calculation was based on multiplying the
selectivity of the first column by the selectivity of the second, something like this:
Selectivity1 * Selectivity2 * Selectivity3 ...
An OR calculation between two columns was more complex. The new calculation looks like this:
Selectivity1 * Power(Selectivity2,1/2) * Power(Selectivity3,1/4) ...
In short, instead of simply multiplying the selectivity to make more and more selective data, a softer calculation
is supplied going from the least selective to the most selective data, but arriving at a softer, less skewed estimate by
getting the power of 1/2 the selectivity, and then 1/4, and then 1/8, and so on, depending on how many columns of
data are involved. It won't change all execution plans generated, but the more accurate estimates could change them
in some locations.
In SQL Server 2014, several sets of new calculations are taking place. This means that for most queries, on
average, you may see performance enhancements if your statistics are up-to-date because more accurate cardinality
calculations means the optimizer will make better choices. But, you may also see performance degradation with some
queries because of the changes in the way cardinality is calculated. This is to be expected because of the wide variety
of workloads, schemas, and data distributions that you may encounter.
 
Search WWH ::




Custom Search