Recovery from corrupted or missing control files

Oracle Backup and Recovery on Windows, Part II – Database Recovery using RMAN:

(Article extracted from Orafusion.com)

1. Introduction:

This article is the second in a series of three, introducing Oracle’s RMAN (Recovery MANager) backup and recovery tool to novice DBAs. The first article dealt with taking RMAN backups. The aim in the present piece is to cover some recovery scenarios. In all the scenarios it is assumed that:

The database host server is still up and running. Recovery in a situation where the server is unavailable will be the subject of the next article.
The last full backup is available on disk.
All archived logs since the last backup are available on disk.

Most of the examples here deal with complete recovery, where all committed transactions up to the point of failure are recovered. We also discuss some cases of incomplete recovery, where the database is recovered to a time prior to failure. Generally one would perform an incomplete recovery only when some of the required logs (archived or online) are missing. One of the examples below deals with just this situation. There are also other valid reasons for performing incomplete recovery – example: to recover a table that has been accidentally dropped. We will not go into these incomplete recovery situations in this article.

The following scenarios are discussed:

Recovery from corrupted or missing datafile(s).
Recovery from corrupted or missing online redo logs.
Recovery from corrupted or missing control files.

In each of the examples discussed it is assumed that RMAN is invoked on the database host machine using an OS account belonging to the ORA_DBA group. This permits connections to the database without a password, i.e. using “/ as sysdba”, as discussed in the first article of this series.

2. Instance and database

In order to understand some of the recovery procedures below, it is necessary to appreciate the difference between an instance and a database.

An instance is made up of Oracle process(es) and associated memory. On Windows operating systems there is a single, multithreaded Oracle process which is associated with a Windows service. The service is normally created when the database is first created, so, as far as this discussion is concerned, the service already exists (for completeness we point out that the service is created and manipulated using the oradim utility – check the Oracle Administrator’s Guide for details on oradim). For our purposes then, an Oracle instance refers to the memory associated with the pre-existing Oracle process. The instance is created when a startup command is issued from the command line (SQL Plus or RMAN for example) or via a GUI tool such as Oracle Enterprise Manager. More on this below. A database , on the other hand, refers to the files comprising the database. These files exist independent of any instance. An instance requires that the computer be powered on and also requires that the Oracle service exists. The database, in contrast, exists even if the computer is powered down.

An instance can be started up in various modes from SQL Plus or RMAN using the startup command. Three variants of the startup command, relevant to the present discussion are:

startup nomount: The instance is started up – i.e. the required memory structures are associated with the pre-existing Oracle process. At this point the instance is not associated with any database.
startup mount: The instance is started up and the database control file is read. At this point Oracle knows the locations of all files that make up the database. However, the database is not open. It is possible to go from the nomount state to the mount state using the SQL command “alter database mount”.
startup: The instance is started up, the control file is read and the database opened for general use. At this point the instance is fully associated with the database. It is possible to go from nomount / mount to the open state using the SQL command “alter database open”.

There are other options to the startup command which we will not go into here. See the Oracle Administrator’s Guide for further details. Note that startup is not a standard SQL command; it can only be executed from an Oracle tool such as SQL Plus or RMAN.

3. Recovery from missing or corrupted datafile(s):

Case 1: Recovery from corrupted or missing datafile

This scenario deals with a situation where a datafile has gone missing, or is corrupted beyond repair. For concreteness, we look at a case where a datafile is missing. Below is a transcript of an SQL Plus session that attempts to open a database with a missing datafile (typed commands in bold, lines in italics are my comments, all other lines are feedback from SQL Plus):

--open SQL Plus from the command line without
 --logging on to database
C:>sqlplus /nolog
SQL*Plus: Release 9.2.0.4.0 - Production on Tue Jan 25 14:52:41 2005
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
--Connect to the idle Oracle process as a privileged user and start up instance
SQL> connect / as sysdba
 Connected to an idle instance.
 SQL> startup
 ORACLE instance started.
Total System Global Area 131555128 bytes
 Fixed Size 454456 bytes
 Variable Size 88080384 bytes
 Database Buffers 41943040 bytes
 Redo Buffers 1077248 bytes
 Database mounted.
 ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
 ORA-01110: data file 4: 'D:ORACLE_DATADATAFILESORCLUSERS01.DBF'
SQL>

The error message tells us that file# 4 is missing. Note that although the startup command has failed, the database is in the mount state. Thus, the database control file, which is also the RMAN repository can be accessed by the instance and by RMAN. We now recover the missing file using RMAN. The transcript of the recovery session is reproduced below (bold lines are typed commands, comments in italics, the rest is feedback from RMAN):

--logon to RMAN
C:>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--restore missing datafile
RMAN> restore datafile 4;
Starting restore at 26/JAN/05
using target database controlfile instead of recovery catalog
 allocated channel: ORA_DISK_1
 channel ORA_DISK_1: sid=14 devtype=DISK
 allocated channel: ORA_DISK_2
 channel ORA_DISK_2: sid=15 devtype=DISK
 channel ORA_DISK_1: starting datafile backupset restore
 channel ORA_DISK_1: specifying datafile(s) to restore from backup set
 restoring datafile 00004 to D:ORACLE_DATADATAFILESORCLUSERS01.DBF
 channel ORA_DISK_1: restored backup piece 1
 piece handle=D:BACKUPQGB0UEC_1_1.BAK tag=TAG20050124T152708 params=NULL
 channel ORA_DISK_1: restore complete
 Finished restore at 26/JAN/05
--recover restored datafile - RMAN applies all logs automatically
RMAN> recover datafile 4;
Starting recover at 26/JAN/05 using channel ORA_DISK_1
 using channel ORA_DISK_2
starting media recovery
archive log thread 1 sequence 4 is already on disk as file E:ORACLE_ARCHIVEORCL1_4.ARC
 archive log thread 1 sequence 5 is already on disk as file C:ORACLE_ARCHIVEORCL1_5.ARC
 archive log thread 1 sequence 6 is already on disk as file E:ORACLE_ARCHIVEORCL1_6.ARC
 archive log thread 1 sequence 7 is already on disk as file E:ORACLE_ARCHIVEORCL1_7.ARC
 archive log filename=E:ORACLE_ARCHIVEORCL1_4.ARC thread=1 sequence=4
 archive log filename=C:ORACLE_ARCHIVEORCL1_5.ARC thread=1 sequence=5
 media recovery complete
 Finished recover at 26/JAN/05
--open database for general use
RMAN> alter database open;
database opened
RMAN>

In the above scenario, the database is already in the mount state before the RMAN session is initiated. If the database is not mounted, you should issue a “startup mount” command before attempting to restore the missing datafile. The database must be mounted before any datafile recovery can be done.

If the database is already open when datafile corruption is detected, you can recover the datafile without shutting down the database. The only additional step is to take the relevant tablespace offline before starting recovery. In this case you would perform recovery at the tablespace level. The commands are:

C:>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--offline affected tablespace
RMAN> sql 'alter tablespace USERS offline immediate';
using target database controlfile instead of recovery catalog
 sql statement: alter tablespace USERS offline immediate
--recover offlined tablespace
RMAN> recover tablespace USERS;
Starting recover at 26/JAN/05
 allocated channel: ORA_DISK_1
 channel ORA_DISK_1: sid=14 devtype=DISK
 allocated channel: ORA_DISK_2
 channel ORA_DISK_2: sid=12 devtype=DISK
starting media recovery
 media recovery complete
Finished recover at 26/JAN/05
--online recovered tablespace
RMAN> sql 'alter tablespace USERS online';
sql statement: alter tablespace USERS online
RMAN>

Here we have used the SQL command, which allows us to execute arbitrary SQL from within RMAN.

Case 2: Recovery from block corruption

It is possible to recover corrupted blocks using RMAN backups. This is a somewhat exotic scenario, but it can be useful in certain circumstances, as illustrated by the following example. Here’s the situation: a user connected to SQLPlus gets a data block corruption error when she queries a table. Here’s a part of the session transcript:

SQL> connect testuser/testpassword
 Connected.
 SQL> select count(*) from test_table;
 select count(*) from test_table
 *
 ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 2015)
 ORA-01110: data file 4: 'D:ORACLE_DATADATAFILESORCLUSERS01.DBF'
Since we know the file and block number, we can perform block level recovery using RMAN. This is best illustrated by example:
C:>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--restore AND recover specific block
RMAN> blockrecover datafile 4 block 2015;
Starting blockrecover at 26/JAN/05
 using target database controlfile instead of recovery catalog
 allocated channel: ORA_DISK_1
 channel ORA_DISK_1: sid=19 devtype=DISK
 allocated channel: ORA_DISK_2
 channel ORA_DISK_2: sid=20 devtype=DISK
channel ORA_DISK_1: restoring block(s)
 channel ORA_DISK_1: specifying block(s) to restore from backup set
 restoring blocks of datafile 00004
 channel ORA_DISK_1: restored block(s) from backup piece 1
 piece handle=E:BACKUPQGB0UEC_1_1.BAK tag=TAG20050124T152708 params=NULL
 channel ORA_DISK_1: block restore complete
starting media recovery
 media recovery complete
Finished blockrecover at 26/JAN/05
RMAN>
Now our user should be able to query the table from her SQLPlus session. Here's her session transcript after block recovery.
SQL> select count(*) from test_table;
COUNT(*)
 ----------
 217001
SQL>

A couple of important points regarding block recovery:

Block recovery can only be done using RMAN.
The entire database can be open while performing block recovery.
Check all database files for corruption. This is important – there could be other corrupted blocks. Verification of database files can be done using RMAN or the dbverify utility. To verify using RMAN simply do a complete database backup with default settings. If RMAN detects block corruption, it will exit with an error message pointing out the guilty file/block.

4. Recovery from missing or corrupted redo log group:

Case 1: A multiplexed copy of the missing log is available.

If a redo log is missing, it should be restored from a multiplexed copy, if possible. This is the only way to recover without any losses. Here’s an example, where I attempt to startup from SQLPlus when a redo log is missing:

SQL> startup
 ORACLE instance started.
Total System Global Area 131555128 bytes
 Fixed Size 454456 bytes
 Variable Size 88080384 bytes
 Database Buffers 41943040 bytes
 Redo Buffers 1077248 bytes
 Database mounted.
 ORA-00313: open failed for members of log group 3 of thread 1
 ORA-00312: online log 3 thread 1: 'D:ORACLE_DATALOGSORCLREDO03A.LOG'
SQL>

To fix this we simply copy REDO03A.LOG from its multiplexed location on E: to the above location on E:. After copying the file, we issue an “alter database open” from the above SQLPlus session:

SQL> alter database open;
Database altered.
SQL>

That’s it – the database is open for use.

Case 2: All members of a log group lost.

In this case an incomplete recovery is the best we can do. We will lose all transactions from the missing log and all subsequent logs. We illustrate using the same example as above. The error message indicates that members of log group 3 are missing. We don’t have a copy of this file, so we know that an incomplete recovery is required. The first step is to determine how much can be recovered. In order to do this, we query the V$LOG view (when in the mount state) to find the system change number (SCN) that we can recover to (Reminder: the SCN is a monotonically increasing number that is incremented whenever a commit is issued):

--The database should be in the mount state for v$log access
SQL> select first_change# from v$log where group#=3 ;
FIRST_CHANGE#
 -------------
 370255
SQL>

The FIRST_CHANGE# is the first SCN stamped in the missing log. This implies that the last SCN stamped in the previous log is 370254 (FIRST_CHANGE#-1). This is the highest SCN that we can recover to. In order to do the recovery we must first restore ALL datafiles to this SCN, followed by recovery (also up to this SCN). This is an incomplete recovery, so we must open the database resetlogs after we’re done. Here’s a transcript of the recovery session (typed commands in bold, comments in italics, all other lines are RMAN feedback):

C:>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (DBID=1507972899)
--Restore ENTIRE database to determined SCN
RMAN> restore database until scn 370254;
Starting restore at 26/JAN/05
using channel ORA_DISK_1
 using channel ORA_DISK_2
 channel ORA_DISK_1: starting datafile backupset restore
 channel ORA_DISK_1: specifying datafile(s) to restore from backup set
 restoring datafile 00001 to D:ORACLE_DATADATAFILESORCLSYSTEM01.DBF
 restoring datafile 00004 to D:ORACLE_DATADATAFILESORCLUSERS01.DBF
 channel ORA_DISK_2: starting datafile backupset restore
 channel ORA_DISK_2: specifying datafile(s) to restore from backup set
 restoring datafile 00002 to D:ORACLE_DATADATAFILESORCLUNDOTBS01.DBF
 restoring datafile 00003 to D:ORACLE_DATADATAFILESORCLTOOLS01.DBF
 channel ORA_DISK_2: restored backup piece 1
 piece handle=E:BACKUP13GB14IB_1_1.BAK tag=TAG20050124T171139 params=NUL
 channel ORA_DISK_2: restore complete
 channel ORA_DISK_1: restored backup piece 1
 piece handle=E:BACKUP14GB14IB_1_1.BAK tag=TAG20050124T171139 params=NUL
 channel ORA_DISK_1: restore complete
 Finished restore at 26/JAN/05
--Recover database
RMAN> recover database until scn 370254;
Starting recover at 26/JAN/05
 using channel ORA_DISK_1
 using channel ORA_DISK_2
starting media recovery
archive log thread 1 sequence 9 is already on disk as file E:ORACLE_ARCHIVEORCL1_9.ARC
 archive log thread 1 sequence 10 is already on disk as file E:ORACLE_ARCHIVEORCL1_10.ARC
 archive log thread 1 sequence 11 is already on disk as file E:ORACLE_ARCHIVEORCL1_11.ARC
 archive log thread 1 sequence 12 is already on disk as file E:ORACLE_ARCHIVEORCL1_12.ARC
 archive log filename=E:ORACLE_ARCHIVEORCL1_9.ARC thread=1 sequence=9
 archive log filename=E:ORACLE_ARCHIVEORCL1_10.ARC thread=1 sequence=10
 media recovery complete
 Finished recover at 26/JAN/05
--open database with RESETLOGS (see comments below)
RMAN> alter database open resetlogs;
database opened
RMAN>

The following points should be noted:

The entire database must be restored to the SCN that has been determined by querying v$log.
All changes beyond that SCN are lost. This method of recovery should be used only if you are sure that you cannot do better. Be sure to multiplex your redo logs, and (space permitting) your archived logs!
The database must be opened with RESETLOGS, as a required log has not been applied. This resets the log sequence to zero, thereby rendering all prior backups worthless. Therefore, the first step after opening a database RESETLOGS is to take a fresh backup. Note that the RESETLOGS option must be used for any incomplete recovery.

5. Recovery from missing or corrupted control file:

Case 1: A multiplexed copy of the control file is available.

On startup Oracle must read the control file in order to find out where the datafiles and online logs are located. Oracle expects to find control files at locations specified in the CONTROL_FILE initialisation parameter. The instance will fail to mount the database if any one of the control files are missing or corrupt. A brief error message will be displayed, with further details recorded in the alert log. The exact error message will vary depending on what has gone wrong. Here’s an example:

SQL> startup
ORACLE instance started.
Total System Global Area 135338868 bytes
 Fixed Size 453492 bytes
 Variable Size 109051904 bytes
 Database Buffers 25165824 bytes
 Redo Buffers 667648 bytes
 ORA-00205: error in identifying controlfile, check alert log for more info
SQL>

On checking the alert log, as suggested, we find the following:

ORA-00202: controlfile: 'e:oracle_dup_destcontrolfileORCLcontrol02.ctl'
 ORA-27046: file size is not a multiple of logical block size
 OSD-04012: file size mismatch (OS 5447783)

The above corruption was introduced by manually editing the control file when the database was closed.

The solution is simple, provided you have at least one uncorrupted control file – replace the corrupted control file with a copy using operating system commands. Remember to rename the copied file. The database should now start up without any problems.

Case 2: All control files lost

What if you lose all your control files? In that case you have no option but to use a backup control file. The recovery needs to be performed from within RMAN, and requires that all logs (archived and current online logs) since the last backup are available. The logs are required because all datafiles must also be restored from backup. The database will then have to be recovered up to the time the control files went missing. This can only be done if all intervening logs are available. Here’s an annotated transcript of a recovery session (as usual, lines in bold are commands to be typed, lines in italics are explanatory comments, other lines are RMAN feedback):

-- Connect to RMAN
 C:>rman target /
Recovery Manager: Release 9.2.0.4.0 - Production
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: ORCL (not mounted)
-- set DBID - get this from the name of the controlfile autobackup.
 -- For example, if autobackup name is
 -- CTL_SP_BAK_C-1507972899-20050124-00 the the DBID is
 -- 1507972899. This step will not be required if the instance is
 -- started up from RMAN
RMAN> set dbid 1507972899
executing command: SET DBID
--restore controlfile from autobackup. The backup is not at the default
 --location so the path must be specified
RMAN> restore controlfile from 'e:backupCTL_SP_BAK_C-1507972899-20050124-00';
Starting restore at 26/JAN/05
using channel ORA_DISK_1
 channel ORA_DISK_1: restoring controlfile
 channel ORA_DISK_1: restore complete
 replicating controlfile
 input filename=D:ORACLE_DATACONTROLFILEORCLCONTROL01.CTL
 output filename=E:ORACLE_DUP_DESTCONTROLFILEORCLCONTROL02.CTL
 output filename=C:ORACLE_DUP_DESTCONTROLFILEORCLCONTROL03.CTL
 Finished restore at 26/JAN/05
-- Now that control files have been restored, the instance can mount the
 -- database.
RMAN> mount database;
database mounted
-- All datafiles must be restored, since the controlfile is older than the current
 -- datafiles. Datafile restore must be followed by recovery up to the current log.
RMAN> restore database;
Starting restore at 26/JAN/05
using channel ORA_DISK_1
 channel ORA_DISK_1: starting datafile backupset restore
 channel ORA_DISK_1: specifying datafile(s) to restore from backup set
 restoring datafile 00001 to D:ORACLE_DATADATAFILESORCLSYSTEM01.DBF
 restoring datafile 00004 to D:ORACLE_DATADATAFILESORCLUSERS01.DBF
 channel ORA_DISK_1: restored backup piece 1
 piece handle=E:BACKUPDGB0I79_1_1.BAK tag=TAG20050124T115832 params=NULL
 channel ORA_DISK_1: restore complete
 channel ORA_DISK_1: starting datafile backupset restore
 channel ORA_DISK_1: specifying datafile(s) to restore from backup set
 restoring datafile 00002 to D:ORACLE_DATADATAFILESORCLUNDOTBS01.DBF
 restoring datafile 00003 to D:ORACLE_DATADATAFILESORCLTOOLS01.DBF
 channel ORA_DISK_1: restored backup piece 1
 piece handle=E:BACKUPCGB0I78_1_1.BAK tag=TAG20050124T115832 params=NULL
 channel ORA_DISK_1: restore complete
 Finished restore at 26/JAN/05
--Database must be recovered because all datafiles have been restored from
 -- backup
RMAN> recover database;
Starting recover at 26/JAN/05
 using channel ORA_DISK_1
starting media recovery
archive log thread 1 sequence 2 is already on disk as file E:ORACLE_ARCHIVEORCL1_2.ARC
 archive log thread 1 sequence 4 is already on disk as file D:ORACLE_DATALOGSORCLREDO02A.LOG
 archive log thread 1 sequence 5 is already on disk as file D:ORACLE_DATALOGSORCLREDO01A.LOG
 archive log thread 1 sequence 6 is already on disk as file D:ORACLE_DATALOGSORCLREDO03A.LOG
 archive log filename=E:ORACLE_ARCHIVEORCL1_2.ARC thread=1 sequence=2
 archive log filename=E:ORACLE_ARCHIVEORCL1_3.ARC thread=1 sequence=3
 archive log filename=E:ORACLE_DATALOGSORCLREDO02A.LOG thread=1 sequence=4
 archive log filename=E:ORACLE_DATALOGSORCLREDO01A.LOG thread=1 sequence=5
 archive log filename=E:ORACLE_DATALOGSORCLREDO03A.LOG thread=1 sequence=6
 media recovery complete
 Finished recover at 26/JAN/05
-- Recovery completed. The database must be opened with RESETLOGS
 -- because a backup control file was used. Can also use
 -- "alter database open resetlogs" instead.
RMAN> open resetlogs database;
database opened

Several points are worth emphasising.

Recovery using a backup controlfile should be done only if a current control file is unavailable.
All datafiles must be restored from backup. This means the database will need to be recovered using archived and online redo logs. These MUST be available for recovery until the time of failure.
As with any database recovery involving RESETLOGS, take a fresh backup immediately.

Technically the above is an example of complete recovery – since all committed transactions were recovered. However, some references consider this to be incomplete recovery because the database log sequence had to be reset.

After recovery using a backup controlfile, all temporary files associated with locally-managed tablespaces are no longer available. You can check that this is so by querying the view V$TEMPFILE – no rows will be returned. Therefore tempfiles must be added (or recreated) before the database is made available for general use. In the case at hand, the tempfile already exists so we merely add it to the temporary tablespace. This can be done using SQLPlus or any tool of your choice:

SQL> alter tablespace temp add tempfile
 'D:oracle_datadatafilesORCLTEMP01.DBF';
Tablespace altered.
SQL>

Check that the file is available by querying v$TEMPFILE.

6. Wrap up:

In an article this size it is impossible to cover all possible recovery scenarios that one might encounter in real life. The above examples will, I hope, provide you with some concrete situations to try out on your test box. The best preparation for real-life recovery is practice. Simulate as many variations of the above situations, and others, as you can think up. Then try recovering from them. The exercise will improve your recovery skills, clarify conceptual issues and highlight deficiencies in your backup strategy.

Source

Reference Links:

The quick and easy RMAN restore

Tag: Recovery from corrupted or missing control files

RMAN – Database Recovery