Problem Diagnosis - Expert Oracle RAC Performance Diagnostics and Tuning

Database Reference

In-Depth Information

Chapter 18

Problem Diagnosis

How many times did DBAs have to open priority one service requests with Oracle support for critical errors faced

while supporting their production environments? Errors are bound to happen, and as much as we would all like to see

it, there is no such thing as the perfect application that is bug free. Critical errors can be caused by a misconfiguration

or uncontrolled environments or due to human error; but when it occurs, they interrupt production, cause downtime,

and slow performance that affects the credibility of the DBA, the system administrators, or the application in general.

So it's important that when problems do arise there is an immediate remedy, the database is operational immediately,

and that the error has fixes in the form of patches or code, operational procedures, or configuration changes that

ensure the errors do not happen again.

RAC is a multi-instance clustered configuration. As discussed in the previous chapters, apart from single-

instance-related issues, RAC could have issues across multiple instances; or a problem on one node can cause

a problem on another node However, on the positive side, the advantage of using RAC over single-instance

configuration is that if one instance fails or is unhealthy, there is always another instance that users can connect to

and use the database.

To help the DBA troubleshoot issues with the environment, Oracle provides utilities that help gather statistics

across all instances. Most of the utilities that focus on database performance-related statistics were discussed in

Chapter 6. There are other scripts and utilities that collect statistics and diagnostic information to help troubleshoot

and get to the root cause of problems.

The data gathered through these utilities will help diagnose where the potential problem could be.

Health Monitor

Probably ever since the database was invented, the DBA's everyday routine task has been to check the health of the

database. On certain days like Mondays, there were additional checks and scripts that got executed compared to the

others. A task that every DBA continues to perform or performed during some part of his or her career irrespective

of whether the database was Oracle, DB2, or SQL Server or Sybase. To complete these routine everyday tasks, several

types and flavors of scripts have been written. Starting with scanning the alert logs for ORA error messages, purging of

trace files and opening service requests with Oracle support has been a regular task of the day. In the 11g Release 2 of

the database, Oracle has introduced a few new features that help in making some of the DBA tasks easier than before.

Similar to the DBA writing scripts to check the various areas of the database—for space, for errors, for locks, and

so forth—the health monitor (HM) provided by Oracle validates most of the areas and components of the database.

Checks performed by the HM include file corruptions, physical and logical corruptions, undo and redo corruptions,

and so forth. Besides just identifying problems, the HM also provides a report of its findings and recommendations.

The HM can be invoked in one of two ways: reactive mode where the checks are performed automatically by the

database when a critical error is encountered, or manually by the DBA when he/she desires to execute any specific

routine at any specific desired time when they get suspicious about issues with certain areas of the database.

Apart from the reactive checks and checks that are performed manually, there are checks that can be done

when the database is online and others that can be executed when the database is offline. The list of tests and their

execution type can be viewed using the V$HM_CHECK query.

Search WWH ::

Custom Search

Home