Database Reference
In-Depth Information
The frequency of gathering of statistics is also a matter of opinion. I have seen everything from hourly to monthly
or even less frequently as being successful. It really depends on your data. In any case, when the staleness of the tables
is used as a basis to regather object statistics, intervals that are too long can lead to an excess of stale objects, which
in turn leads to excessive time required for statistics gathering and a peak in resource usage. For this reason, I like to
schedule them frequently (to spread out the load) and keep single runs as short as possible. If your system has daily or
weekly low-utilization periods, then scheduling runs during those periods is usually a good thing. If your system is a
true 7×24 system, then it's usually better to use very frequent schedules (many times per day) to spread out the load as
much as possible and avoid peaks.
If you have jobs that load or modify lots of data (for example, the ETL jobs in a data warehouse environment), you
shouldn't wait for a scheduled gathering of object statistics. Simply make the gathering of statistics for the modified
objects part of the job itself. In other words, if you know that something has substantially changed, trigger the
gathering of statistics immediately.
If for some good reason object statistics shouldn't be gathered on some tables, you should lock them. In this way,
the job that regularly gathers object statistics will simply skip them. This is much better than completely deactivating
the job for the whole database.
You should take advantage of the default gathering job as much as possible. For this to meet your requirements,
you should check, and if necessary change, the default configuration. Because a configuration at the object level is
possible only as of version 11.1, if you have particular requirements for some tables in prior versions, you should
schedule a job that processes them before the default job. In this way, the default job, which processes only those
tables with stale statistics, will simply skip them. Locks might also be helpful to ensure that only a specific job is
regathering object statistics on those critical tables.
If you're thinking about completely disabling the default gathering job instead, you should set the
autostats_target preference to oracle . That way, you let the database engine take care of the data dictionary,
and for the other tables, you set up a specific job that does what you expect.
If gathering statistics leads to inefficient execution plans, you can do two things. The first is to fix the problem by
restoring the object statistics that were successfully in use before gathering statistics. The second is to find out why
inefficient execution plans are generated by the query optimizer with the new object statistics. To do this, you should first
check whether the newly gathered statistics correctly describe the data. For example, it's possible that sampling along with
a new data distribution will lead to different histograms. If object statistics aren't good, then the gathering itself, or possibly
a parameter used for their gathering, is the problem. If the object statistics are in fact good, there are two more possible
causes. Either the query optimizer isn't correctly configured or the query optimizer is making a mistake. You have little
control over the latter, but you should be able to find a solution for the former. In any case, you should avoid thinking too
hastily that gathering object statistics is inherently problematic and, as a result, stop gathering them regularly.
The best practice is to gather object statistics with the dbms_stats package. However, there are situations where
the correct object statistics may be misleading for the query optimizer. A common example is data for which a history
must be kept online for a long time (for instance, in Switzerland some types of data must be kept for at least ten years).
In such cases, if the data distribution hardly changes over the time, the object statistics gathered by the dbms_stats
package should be fine. In contrast, if the data distribution is strongly dependent on the period and the application
frequently accesses only a subset of the data, it could make sense to manually modify (that is, fudge) object statistics
to describe the most relevant data. In other words, if you know something that the dbms_stats package ignores or isn't
able to discover, it's legitimate to inform the query optimizer by fudging the object statistics.
On to Chapter 9
This chapter describes what table statistics, column statistics, histograms, and index statistics are and how they in turn
describe the data stored in the database. It also covers how to gather object statistics with the dbms_stats package and
where to find them in the data dictionary.
This chapter gives little information about the utilization of object statistics. Such information is provided in
Chapter 9, along with a description of the initialization parameters that configure the query optimizer. After reading
that chapter, you should be able to correctly configure the query optimizer and, as a result, get efficient execution
 
Search WWH ::




Custom Search