Statistics and Cost Estimation - Inside the SQL Server Query Optimizer

Databases Reference

In-Depth Information

Going back to the histogram, EQ_ROWS is the estimated number of rows whose column

value equals RANGE_HI_KEY . So, in our example, for the RANGE_HI_KEY value of 831,

EQ_ROWS shows 198, which we know is also the actual number of existing records for

ProductID 831.

RANGE_ROWS is the estimated number of rows whose column value falls inside the range

of the step, excluding the upper boundary. In our example, this is the number of records

with values from 827 to 830 (831, the upper boundary or RANGE_HI_KEY , is excluded).

The histogram shows 110 records and we could obtain the same value by getting the

sum of 31 records for ProductID 827, 46 records for ProductID 828, 0 records for

ProductID 829, and 33 records for ProductID 830.

DISTINCT_RANGE_ROWS is the estimated number of rows with a distinct column value

inside this range, once again excluding the upper boundary. In our example, we have

records for three distinct values: 827, 828, and 830, so DISTINCT_RANGE_ROWS is 3.

There are no records for ProductID 829, and 831, which is the upper boundary, is again

excluded.

Finally, AVG_RANGE_ROWS is the average number of rows per distinct value,

excluding the upper boundary, and it is simply calculated as RANGE_ROWS / DISTINCT_

RANGE_ROWS . In our example, we have a total of 110 records for 3 DISTINCT_RANGE_

ROWS , which gives us 110 / 3 = 36.6667, also shown in the second step of the histogram

shown previously.

Now let's see how the histogram is used to estimate the selectivity of some queries.

Let us see the first query:

SELECT * FROM Sales . SalesOrderDetail

WHERE ProductID = 831

Listing 3-15.

Search WWH ::

Custom Search

Home