Database Reference
In-Depth Information
Chapter 18
Problem Diagnosis
How many times did DBAs have to open priority one service requests with Oracle support for critical errors faced
while supporting their production environments? Errors are bound to happen, and as much as we would all like to see
it, there is no such thing as the perfect application that is bug free. Critical errors can be caused by a misconfiguration
or uncontrolled environments or due to human error; but when it occurs, they interrupt production, cause downtime,
and slow performance that affects the credibility of the DBA, the system administrators, or the application in general.
So it's important that when problems do arise there is an immediate remedy, the database is operational immediately,
and that the error has fixes in the form of patches or code, operational procedures, or configuration changes that
ensure the errors do not happen again.
RAC is a multi-instance clustered configuration. As discussed in the previous chapters, apart from single-
instance-related issues, RAC could have issues across multiple instances; or a problem on one node can cause
a problem on another node However, on the positive side, the advantage of using RAC over single-instance
configuration is that if one instance fails or is unhealthy, there is always another instance that users can connect to
and use the database.
To help the DBA troubleshoot issues with the environment, Oracle provides utilities that help gather statistics
across all instances. Most of the utilities that focus on database performance-related statistics were discussed in
Chapter 6. There are other scripts and utilities that collect statistics and diagnostic information to help troubleshoot
and get to the root cause of problems.
The data gathered through these utilities will help diagnose where the potential problem could be.
Health Monitor
Probably ever since the database was invented, the DBA's everyday routine task has been to check the health of the
database. On certain days like Mondays, there were additional checks and scripts that got executed compared to the
others. A task that every DBA continues to perform or performed during some part of his or her career irrespective
of whether the database was Oracle, DB2, or SQL Server or Sybase. To complete these routine everyday tasks, several
types and flavors of scripts have been written. Starting with scanning the alert logs for ORA error messages, purging of
trace files and opening service requests with Oracle support has been a regular task of the day. In the 11g Release 2 of
the database, Oracle has introduced a few new features that help in making some of the DBA tasks easier than before.
Similar to the DBA writing scripts to check the various areas of the database—for space, for errors, for locks, and
so forth—the health monitor (HM) provided by Oracle validates most of the areas and components of the database.
Checks performed by the HM include file corruptions, physical and logical corruptions, undo and redo corruptions,
and so forth. Besides just identifying problems, the HM also provides a report of its findings and recommendations.
The HM can be invoked in one of two ways: reactive mode where the checks are performed automatically by the
database when a critical error is encountered, or manually by the DBA when he/she desires to execute any specific
routine at any specific desired time when they get suspicious about issues with certain areas of the database.
Apart from the reactive checks and checks that are performed manually, there are checks that can be done
when the database is online and others that can be executed when the database is offline. The list of tests and their
execution type can be viewed using the V$HM_CHECK query.
 
Search WWH ::




Custom Search