Database Reference
In-Depth Information
Extended Statistics
The column statistics and histograms described in the previous sections are helpful only when column values are
used without being modified in predicates. For example, if the predicate country='Switzerland' is used, with
column statistics and a histogram in place for the country column, the query optimizer should be able to correctly
estimate its selectivity. This is because column statistics and the histogram describe the values of the country
column itself. On the other hand, if the predicate upper(country)='SWITZERLAND' is used, the query optimizer
is no longer able to directly infer the selectivity from the object statistics and the histogram. A similar problem
occurs when a predicate references several columns. For example, if I apply the predicate country='Denmark' AND
language='Danish' to a table containing people from all over the world, it's likely that the two restrictions apply
to the same rows for most rows in such a table. In fact, most people speaking Danish live in Denmark, and most
people living in Denmark speak Danish. In other words, the two restrictions are almost redundant. Such columns
are commonly called correlated columns and challenge the query optimizer. This is because no object statistics or
histograms describe such a dependency between data—or put another way, the query optimizer actually assumes
that data stored in different columns isn't interdependent.
As of version 11.1, it's possible to gather object statistics and histograms on expressions or on groups of columns
to solve these kinds of problems. These new statistics are called extended statistics . Basically, what happens is that a
hidden column, called an extension , is created, based on either an expression or a group of columns. Then regular
object statistics and histograms are gathered on it.
The definition is carried out with the create_extended_stats function of the dbms_stats package. For example,
two extensions are created with the following query. The first one is on upper(pad) , and the second one is a column
group made up of the columns val2 and val3 . In the test table, these contain exactly the same values; in other words,
the columns are highly (actually, perfectly) correlated. For the definition, as shown next, the expression or group
of columns must be enclosed in parentheses. Notice that the function returns a system-generated name for the
extension (a 30-byte name starting with SYS_STU ):
SQL> SELECT dbms_stats.create_extended_stats(ownname => user,
2 tabname => 'T',
3 extension => '(upper(pad))') AS ext1,
4 dbms_stats.create_extended_stats(ownname => user,
5 tabname => 'T',
6 extension => '(val2,val3)') AS ext2
7 FROM dual;
EXT1 EXT2
------------------------------ ------------------------------
SYS_STU0KSQX64#I01CKJ5FPGFK3W9 SYS_STUPS77EFBJCOTDFMHM8CHP7Q1
Note
A group of columns can't reference expressions or virtual columns.
Obviously, once the extensions are created, the data dictionary provides information about them. The following
query, based on the user_stat_extensions view, shows the existing extensions of the test table. There are dba , all ,
and, in a 12.1 multitenant environment, cdb versions as well:
SQL> SELECT extension_name, extension
2 FROM user_stat_extensions
3 WHERE table_name = 'T';
 
 
Search WWH ::




Custom Search