Database Reference
In-Depth Information
SQL> SELECT statement_id, cardinality
2 FROM plan_table
3 WHERE id = 0;
STATEMENT_ID CARDINALITY
------------ -----------
101 80
102 80
103 80
104 80
105 600
106 80
Therefore, in practice, height-balanced histograms may not only be misleading but may also lead to instability
in query optimizer estimations. For this reason, as of version 12.1, top frequency histograms and hybrid histograms
replace height-balanced histograms.
Top Frequency Histograms
One of the key characteristics of a frequency histogram is that every value is represented in the histogram. Even
though they're accurate, because of the limit in the number of buckets, sometimes they can't be created. The concept
behind top frequency histograms is that, in case some of the values represent a small percentage of the data, they
can be safely discarded because they're statistically insignificant. And, if it's possible to discard enough values to void
surpassing the limit of the number of buckets, a top frequency histogram, which is based only on the top-n values,
may be created.
To determine whether a histogram with n buckets is sufficiently accurate, the database engine checks whether
those n values represent at least p percent of the rows, where p is computed with Formula 8-1. For example, a top
frequency histogram like the one gathered on the val3 column, with its 5 buckets, has to represent at least 80% (100 -
100/5) of the rows.
Formula 8-1. Minimum percentage of rows that have to be represented by the top-n values
100
p
=-
100
n
In the case of the val3 column, five buckets are sufficient because, as the output of the following query shows, the
top-3 values already account for more than 80% of the rows:
SQL> SELECT val3, count(*) AS frequency, ratio_to_report(count(*)) OVER ()*100 AS percent
2 FROM t
3 GROUP BY val3
4 ORDER BY val3;
VAL3 FREQUENCY PERCENT
---------- --------- ----------
101 8 0.8
102 25 2.5
103 68 6.8
104 185 18.5
105 502 50.2
 
Search WWH ::




Custom Search