Database Reference
In-Depth Information
Here are the main characteristics of height-balanced histograms:
The number of buckets is less than the number of distinct values. For each bucket, except
when they're compressed, a row with the endpoint number is available in a view like
user_tab_histograms . In addition, the endpoint number 0 indicates the minimum value.
The
endpoint_value column gives a numerical representation of the value itself.
For additional information about this column, refer to the description provided in the
“Height-Balanced Histograms” section.
The
endpoint_number column gives the bucket number.
The histogram doesn't store the frequency of the values.
The following example shows the estimations performed by the query optimizer when the height-balanced
histogram is in place. Note the lower precision compared to the frequency histogram:
SQL> EXPLAIN PLAN SET STATEMENT_ID '101' FOR SELECT * FROM t WHERE val2 = 101;
SQL> EXPLAIN PLAN SET STATEMENT_ID '102' FOR SELECT * FROM t WHERE val2 = 102;
SQL> EXPLAIN PLAN SET STATEMENT_ID '103' FOR SELECT * FROM t WHERE val2 = 103;
SQL> EXPLAIN PLAN SET STATEMENT_ID '104' FOR SELECT * FROM t WHERE val2 = 104;
SQL> EXPLAIN PLAN SET STATEMENT_ID '105' FOR SELECT * FROM t WHERE val2 = 105;
SQL> EXPLAIN PLAN SET STATEMENT_ID '106' FOR SELECT * FROM t WHERE val2 = 106;
SQL> SELECT statement_id, cardinality
2 FROM plan_table
3 WHERE id = 0;
STATEMENT_ID CARDINALITY
------------ -----------
101 50
102 50
103 50
104 50
105 400
106 300
You might expect the cardinality estimation of the values 105 and 106 to be identical (400, because both
popular values cover 2/5 of the buckets). however, for the value 106, this isn't the case. That's because the query
optimizer adjusts its estimations when a popular value is also the maximum value represented in the histogram.
Note
Also for this type of histogram, let's have a look at what happens when values not represented in the histogram
are used. There are two distinct situations to consider. First, if the value is between the minimum and the maximum
values, the query optimizer uses the same frequency as for the other non-popular values. Second, if the value is
outside the range covered by the histogram, the frequency depends on the distance from the lowest/maximum value.
The following example illustrates:
SQL> EXPLAIN PLAN SET STATEMENT_ID '096' FOR SELECT * FROM t WHERE val2 = 96;
SQL> EXPLAIN PLAN SET STATEMENT_ID '098' FOR SELECT * FROM t WHERE val2 = 98;
SQL> EXPLAIN PLAN SET STATEMENT_ID '100' FOR SELECT * FROM t WHERE val2 = 100;
 
 
Search WWH ::




Custom Search