Databases Reference
In-Depth Information
Gathering Statistics During a Rebuild
You can gather statistics on an index while rebuilding the index, as the following example shows:
SQL> alter index hr.emp_emp_id_pk rebuild compute statistics;
Index altered.
You save time by having the database gather statistics while it's rebuilding the index.
The METHOD_OPT Parameter
A parameter named METHOD_OPT gives you control over statistics collection as it relates to columns. You
specify the METHOD_OPT parameter of the DBMS_STATS package in order to tell the database two things:
The columns for which it should collect statistics.
If it should collect a histogram for any column, and if so, how many buckets
should be in the histogram.
Often, DBAs specify the for all indexed columns size auto value for the METHOD_OPT parameter
when collecting table statistics. Greg Rahn shows why this may be really a bad idea in most cases in his
interesting article on the use of this parameter (/ http://structureddata.org/2008/10/14/dbms-stats-
method_ opt-and-for-all-indexed_columns/ ).
Specifying the value for all indexed columns size auto for the METHOD_OPT parameter has the
following implications:
It tells the database to collect statistics only for those columns that are indexed.
It lets the database determine if it should collect histograms, as well as the number
of buckets it should allocate for a histogram.
When you specify the for all indexed columns size auto option, the database collects no statistics
on the unindexed columns; it simply uses default values for the number of default values and
cardinality. As a result, it may often end up with a vastly underestimated number of rows. When you get
an explain plan with the help of the DBMS_XPLAN.DISPLAY_CURSOR procedure (run with the ALLSTATS LAST
option), it'll show that the number of estimated rows (E-rows) is often underestimated when compared
to the number of actual rows (A-rows). The wrong cardinality estimates are very likely to play havoc with
the selection of the access paths, join methods, and join order, leading to poor SQL execution times. The
whole idea behind collecting optimizer statistics is to collect representative statistics. The database
doesn't really know your data; you do! (If you really want to understand how to collect good statistics,
please read the excellent article on the use of the DBMS_STATS package by Karen Morton titled “Managing
Statistics for Optimal Query Performance,” available on the method-r.com web site)
The cost optimizer is very likely to produce suboptimal execution plans when it isn't fed accurate
statistics. Inaccurate cardinality estimates are a good example of the consequences of the database
collecting nonrepresentative statistics. For example, a wrong cardinality estimate can lead to the
optimizer selecting the wrong driving table. Alternatively, the optimizer may decide that a NL join is
better when a hash join would be more appropriate, especially when dealing with large data sets.
 
Search WWH ::




Custom Search