Database Reference
In-Depth Information
12 AND column_name = 'VAL1'
13 /
LOW_VALUE HIGH_VALUE
--------- ----------
39 63
density is a decimal number between 0 and 1. Values close to 0 indicate that a restriction on
that column filters out the majority of the rows. Values close to 1 indicate that a restriction on
that column filters almost no rows. If no histogram is present, density is 1/num_distinct . If
a histogram is present, the computation differs and depends on the type of histogram. In any
case, as of version 10.2.0.4, for columns with histograms, this value is only used for backward
compatibility when the optimizer_features_enable initialization parameter is set to an older
release.
num_nulls is the number of NULL values stored in the column.
avg_col_len is the average column size in bytes.
histogram indicates whether a histogram is available for the column and, if it's available,
which type it is. Valid values are NONE (meaning no histogram), FREQUENCY , HEIGHT BALANCED ,
and, as of version 12.1, TOP-FREQUENCY and HYBRID .
num_buckets is the number of buckets in the histogram. A bucket, or category as it's called in
statistics, is a group of values of the same kind. As described in the next section, histograms
are composed of at least one bucket. If no histogram is available, it's set to 1. The maximum
number of buckets is 254 up to version 11.2, and 2,048 as of version 12.1.
Histograms
The query optimizer starts from the principle that data is uniformly distributed. An example of a uniformly distributed
set of data is the one stored in the id column in the test table used throughout the previous sections. In fact, it stores all
integers from 1 up to 1,000 exactly once. In such a case, to produce a good estimate of the number of rows filtered out
by a predicate based on that column (for example, id BETWEEN 6 AND 19 ), the query optimizer requires only the object
statistics described in the preceding section: the minimum value, the maximum value, and the number of distinct values.
If data isn't uniformly distributed, the query optimizer can't compute acceptable estimations without additional
information. For example, given the data set stored in the val2 column (see the output of the following query), how
could the query optimizer make a meaningful estimation for a predicate like val2=105 ? It can't, because it has no clue
that about 50 percent of the rows fulfill that predicate:
SQL> SELECT val2, count(*)
2 FROM t
3 GROUP BY val2
4 ORDER BY val2;
VAL2 COUNT(*)
---------- ----------
101 8
102 25
103 68
104 185
 
Search WWH ::




Custom Search