Database Reference
In-Depth Information
Chapter 6
RMAN Restore and Recovery
A couple of years ago, I was out on a long Saturday morning bike ride. About halfway through the ride, my cell phone
rang. It was one of the data center operational support technicians. He told me that a mission critical database
server was acting strange and that I should log in as soon as possible and make sure things were okay. I told him that
I was about 15 minutes from being able to log in. So, I scurried home as fast as I could to check out the production
box. When I got home and logged in to the database server, I tried to start SQL*Plus and immediately got an error
indicating that the SQL*Plus binary file had corruption. Great. I couldn't even log in to SQL*Plus. This was not good.
Mental Note
Ensure that all bicycle rides are taken out of range of cell phone coverage. - Ed.
I had the SA restore the Oracle binaries from an OS backup. I started SQL*Plus. The database had crashed,
so I attempted to start it. The output indicated that there was a media failure with all the data files. After some analysis
it was discovered that there had been some filesystem issues and that all these files on disk were corrupt:
Data files
Control files
Archive redo logs
Online redo log files
RMAN backup pieces
This was almost a total disaster. My director asked about our options. I responded, “All we have to do is restore
the database from our last tape backup, and we'll lose whatever data are in archive redo logs that haven't been backed
up to tape yet.”
The storage administrators were called in and instructed to restore the last set of RMAN backups that had been
written to tape. About 15 minutes later, we could hear the tape guys talking to each other in hushed voices. One of
them said, “We are sooooo hosed. We don't have any tape backups of RMAN for any databases on this box.”
That was a dark moment. The worst case scenario was to rebuild the database from DDL scripts and lose 3 years
of production data. Not a very palatable option.
After looking around the production box, I discovered that the prior production support DBA (who, ironically,
had just been let go a few days before, owing to budget cuts) had implemented a job to copy the RMAN backups to
another server in the production environment. The RMAN backups on this other server were intact. I was able to
restore and recover the production database from these backups. We lost about a day's worth of data (between corrupt
archive logs and downtime, in which no incoming transactions were allowed), but we were able to get the database
restored and recovered approximately 20 hours after the initial phone call. That was a long day.
 
 
Search WWH ::




Custom Search