Oracle Backup and Recovery on Windows, Part III – Disaster Recovery using RMAN:
(Article extracted from Orafusion.com)
1. Introduction:
This is the final article of a three part series, introducing Oracle’s RMAN (Recovery MANager) backup and recovery tool to novice DBAs. The first article dealt with taking RMAN backups and the second one covered some recovery scenarios. This article discusses disaster recovery – i.e. a situation in which your database server has been destroyed and has taken all your database files (control files, logs and data files) with it. Obviously, recovery from a disaster of this nature is dependent on what you have in terms of backups and hardware resources. We assume you have the following available after the disaster:
- A server with the same disk layout as the original.
- The last full hot backup on tape.
With the above items at hand, it is possible to recover all data up to the last full backup. One can do better if subsequent archive logs (after the last backup) are available. In our case these aren’t available, since our only archive destination was on the destroyed server (see Part I ). Oracle provides methods to achieve better data protection. We will discuss some of these towards the end of the article.
Now on with the task at hand. The high-level steps involved in disaster recovery are:
- Build replacement server.
- Restore backup from tape.
- Install database software.
- Create Oracle service.
- Restore and recover database.
It sounds quite straightforward and it is reasonably so. However, there are some details that are often missed in books and documentation. The devil, as always, is in these details. Here we hope to provide you with enough detail to plan and practice disaster recovery in your test environment.
2. Build the server
You need a server to host the database, so the first step is to acquire or build the new machine. This is not strictly a DBA task, so we won’t delve into details here. The main point to keep in mind is that the replacement server should, as far as possible, be identical to the old one. In particular, pay attention to the following areas:
Disk layout and capacity: Ideally the server should have the same number of disks as the original. This avoids messy renaming of files during recovery. Obviously, the new disks should also have enough space to hold all software and data that was on the original server.
Operating system, service pack and patches: The operating system environment should be the same as the original, right up to service pack and patch level.
Memory: The new server must have enough memory to cater to Oracle and operating system / other software requirements. Oracle memory structures (Shared pool, db buffer caches etc) will be sized identically to the original database instance. Use of the backup server parameter file will ensure this.
Although you probably won’t build and configure the machine yourself. It is important to work with your systems people so that the above items are built to the recovery server specs.
3. Restore backup from tape
The next step is to get your backup from tape on to disk. In our example from Part I, the directory to be restored is e:backup. The details of this depend on the backup product used, so we can’t go into it any further. This task would normally be performed your local sysadmin.
4. Install Oracle Software
Now we get to the meat of the database recovery process. The next step is to install Oracle software on the machine. The following points should be kept in mind when installing the software:
- Install the same version of Oracle as was on the destroyed server. The version number should match right down to the patch level, so this may be a multi-step process involving installation followed by the application of one or more patchsets and patches.
- Do not create a new database at this stage.
- Create a listener using the Network Configuration Assistant. Ensure that it has the same name and listening ports as the original listener. Relevant listener configuration information can be found in the backed up listener.ora file.
4. Create directory structure for database files
After software installation is completed, create all directories required for datafiles, (online and archived) logs, control files and backups. All directory paths should match those on the original server. This, though not mandatory, saves additional steps associated with renaming files during recovery.
Don’t worry if you do not know where the database files should be located. You can obtain the required information from the backup spfile and control file at a later stage. Continue reading – we’ll come back to this later.
5. Create Oracle service
As described in section 2 of Part II, an Oracle service must be exist before a database is created. The service is created using the oradim utility, which must be run from the command line. The following commands show how to create and modify a service (comments in italics, typed commands in bold):
–create a new service with manual startup
C:>oradim -new -sid ORCL -startmode m
–modify service to startup automatically
C:>oradim -edit -sid ORCL -startmode a
Unfortunately oradim does not give any feedback, but you can check that the service exists via the Services administrative panel. The service has been configured to start automatically when the computer is powered up. Note that oradim offers options to delete, startup and shutdown a service. See the documentation for details.
6. Restore and recover database
Now it is time to get down to the nuts and bolts of database recovery. There are several steps, so we’ll list them in order:
- Copy password and tnsnames file from backup: The backed up password file and tnsnames.ora files should be copied from the backup directory (e:backup, in our example) to the proper locations. Default location for password and tnsnames files are ORACLE_HOMEdatabase ORACLE_HOMEnetworkadmin respectively.
- Set ORACLE_SID environment variable: ORACLE_SID should be set to the proper SID name (ORCL in our case). This can be set either in the registry (registry key: HKLMSoftwareOracleHOMEORACLE_SID) or from the system applet in the control panel.
- Invoke RMAN and set the DBID: We invoke rman and connect to the target database as usual. No login credentials are required since we connect from an OS account belonging to ORA_DBA. Note that RMAN accepts a connection to the database although the database is yet to be recovered. RMAN doesn’t as yet “know” which database we intend to connect to. We therefore need to identify the (to be restored) database to RMAN. This is done through the database identifier (DBID). The DBID can be figured out from the name of the controlfile backup. Example: if you use the controlfile backup format suggested in Part I, your controlfile backup name will be something like “CTL_SP_BAK_C-1507972899-20050228-00”. In this case the DBID is 1507972899. Here’s a transcript illustrating the process of setting the DBID:
C:>rman Recovery Manager: Release 9.2.0.4.0 - Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved. RMAN> connect target / connected to target database (not started) RMAN> set dbid 1507972899 executing command: SET DBID RMAN>
- Restore spfile from backup: To restore the spfile, you first need to startup the database in the nomount state. This starts up the database using a dummy parameter file. After that you can restore the spfile from the backup (which has been restored from tape in Section 3). Finally you restart the database in nomount state. The restart is required in in order to start the instance using the restored parameter file. Here is an example RMAN transcript for the foregoing procedure. Note the difference in SGA size and components between the two startups:
RMAN> startup nomount startup failed: ORA-01078: failure in processing system parameters LRM-00109: could not open parameter file 'C:ORACLEORA92DATABASEINITORCL.ORA' trying to start the Oracle instance without parameter files ... Oracle instance started Total System Global Area 97590928 bytes Fixed Size 454288 bytes Variable Size 46137344 bytes Database Buffers 50331648 bytes Redo Buffers 667648 bytes RMAN> restore spfile from 'e:backupCTL_SP_BAK_C-1507972899-20050228-00'; Starting restore at 01/MAR/05 using target database controlfile instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=9 devtype=DISK channel ORA_DISK_1: autobackup found: e:backupCTL_SP_BAK_C-1507972899-20050228-00 channel ORA_DISK_1: SPFILE restore from autobackup complete Finished restore at 01/MAR/05 RMAN> startup force nomount Oracle instance started Total System Global Area 1520937712 bytes Fixed Size 457456 bytes Variable Size 763363328 bytes Database Buffers 754974720 bytes Redo Buffers 2142208 bytes RMAN> The instance is now started up with the correct initialisation parameters.
We are now in a position to determine the locations of control file and archive destination, as this information sits in the spfile. This is done via SQL Plus as follows:
C:>sqlplus /nolog ....output not shown SQL>connect / as sysdba Connected. SQL> show parameter control_file ....output not shown SQL> show parameter log_archive_dest ....output not shown
The directories listed in the CONTROL_FILES and LOG_ARCHIVE_DEST_N parameters should be created at this stage if they haven’t been created earlier.
- Restore control file from backup: The instance now “knows” where the control files should be restored, as this is listed in the CONTROL_FILES initialisation parameter. Therefore, the next step is to restore these files from backup. Once the control files are restored, the instance should be restarted in mount mode. A restart is required because the instance must read the initialisation parameter file in order to determine the control file locations. At the end of this step RMAN also has its proper configuration parameters, as these are stored in the control file.
Here is a RMAN session transcript showing the steps detailed here:
RMAN> restore controlfile from 'e:backupCTL_SP_BAK_C-1507972899-20050228-00'; Starting restore at 01/MAR/05 allocated channel: ORA_DISK_1 hannel ORA_DISK_1: sid=13 devtype=DISK channel ORA_DISK_1: restoring controlfile channel ORA_DISK_1: restore complete replicating controlfile input filename=D:ORACLE_DATACONTROLFILEORCLCONTROL01.CTL output filename=E:ORACLE_DATACONTROLFILEORCLCONTROL02.CTL output filename=C:ORACLE_DUP_DESTCONTROLFILEORCLCONTROL03.CTL Finished restore at 01/MAR/05 RMAN> shutdown Oracle instance shut down RMAN> exit Recovery Manager complete. C:>rman target / Recovery Manager: Release 9.2.0.4.0 - Production Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved. connected to target database (not started) RMAN> startup mount; Oracle instance started database mounted Total System Global Area 1520937712 bytes Fixed Size 457456 bytes Variable Size 763363328 bytes Database Buffers 754974720 bytes Redo Buffers 2142208 bytes RMAN> show all; using target database controlfile instead of recovery catalog RMAN configuration parameters are: CONFIGURE RETENTION POLICY TO REDUNDANCY 2; CONFIGURE BACKUP OPTIMIZATION OFF; # default CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default CONFIGURE CONTROLFILE AUTOBACKUP ON; CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO 'e:backupctl_sp_bak_%F'; CONFIGURE DEVICE TYPE DISK PARALLELISM 2; CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default CONFIGURE CHANNEL 1 DEVICE TYPE DISK FORMAT 'e:backup%U.bak' MAXPIECESIZE 4G; CONFIGURE CHANNEL 2 DEVICE TYPE DISK FORMAT 'e:backup%U.bak' MAXPIECESIZE 4G; CONFIGURE MAXSETSIZE TO UNLIMITED; # default CONFIGURE SNAPSHOT CONTROLFILE NAME TO 'C:ORACLEORA92DATABASESNCFORCL.ORA'; # default RMAN>
At this stage we can determine the locations of data files and redo logs if we don’t know where they should go. This is done from SQL Plus as follows:
C:>sqlplus /nolog ...output not shown SQL> connect / as sysdba Connected. SQL> select name from v$datafile; ...output not shown SQL> select member from v$logfile; ...output not shown SQL>
The directories shown in the output should be created manually if this hasn’t been done earlier.
- Restore all datafiles: This is easy. Simply issue a “restore database” command from RMAN, and it will do all the rest for you:
RMAN> restore database; Starting restore at 01/MAR/05 using target database controlfile instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: sid=11 devtype=DISK allocated channel: ORA_DISK_2 channel ORA_DISK_2: sid=8 devtype=DISK channel ORA_DISK_1: starting datafile backupset restore channel ORA_DISK_1: specifying datafile(s) to restore from backup set restoring datafile 00001 to D:ORACLE_DATADATAFILESORCLSYSTEM01.DBF restoring datafile 00003 to D:ORACLE_DATADATAFILESORCLUSERS01.DBF restoring datafile 00004 to D:ORACLE_DATADATAFILESORCLUSERS02.DBF channel ORA_DISK_2: starting datafile backupset restore channel ORA_DISK_2: specifying datafile(s) to restore from backup set restoring datafile 00002 to D:ORACLE_DATADATAFILESORCLUNDOTBS01.DBF restoring datafile 00005 to D:ORACLE_DATADATAFILESORCLTOOLS01.DBF restoring datafile 00006 to D:ORACLE_DATADATAFILESORCLTOOLS02.DBF channel ORA_DISK_2: restored backup piece 1 piece handle=E:BACKUP80G6E1TT_1_1.BAK tag=TAG20041130T222501 params=NULL channel ORA_DISK_1: restored backup piece 1 piece handle=E:BACKUP81G6E1TU_1_1.BAK tag=TAG20041130T222501 params=NULL channel ORA_DISK_2: restored backup piece 2 piece handle=E:BACKUP80G6E1TT_2_1.BAK tag=TAG20041130T222501 params=NULL channel ORA_DISK_1: restored backup piece 2 piece handle=E:BACKUP81G6E1TU_2_1.BAK tag=TAG20041130T222501 params=NULL channel ORA_DISK_1: restored backup piece 3 piece handle=E:BACKUP81G6E1TU_3_1.BAK tag=TAG20041130T222501 params=NULL channel ORA_DISK_1: restore complete channel ORA_DISK_2: restored backup piece 3 piece handle=E:BACKUP80G6E1TT_3_1.BAK tag=TAG20041130T222501 params=NULL channel ORA_DISK_2: restore complete Finished restore at 01/MAR/05 RMAN>
- Recover database: The final step is to recover the database. Obviously recovery is dependent on the available archived (and online) redo logs. Since we have lost our database server and have no remote archive destination, we can recover only up to the time of the backup. Further, since this is an incomplete recovery, we will have to open the database with resetlogs. Here’s a sample RMAN session illustrating this:
RMAN> recover database; Starting recover at 01/MAR/05 using channel ORA_DISK_1 using channel ORA_DISK_2 starting media recovery unable to find archive log archive log thread=1 sequence=1388 RMAN-00571: ============================== RMAN-00569: =ERROR MESSAGE STACK FOLLOWS = RMAN-00571: =============================== RMAN-03002: failure of recover command at 04/01/2005 14:14:43 RMAN-06054: media recovery requesting unknown log: thread 1 scn 32230460 RMAN> alter database open resetlogs; database opened RMAN>
Note that RMAN automatically applies all available archive logs. It first applies the backed up log and then searches for subsequent logs in the archive destination. This opens the door for further recovery if the necessary logs are available. In our case, however, we have no more redo so we open the database with resetlogs. The error message above simply indicates that RMAN has searched, unsuccessfully, for subsequent logs.
That’s it. The database has been recovered, from scratch, to the last available backup. Now having done this, it is worth spending some time in discussing how one can do better – i.e. recover up to a point beyond the backup. We do this in the next section.
7. Options for better recovery
The above recovery leaves one with a sense of dissatisfaction: one could have done much better had the necessary logs been available. Clearly, one would have to copy the logs to a remote machine in order to guarantee access in a disaster situation. A couple of ways to do this include:
Copy archive logs to a remote destination using OS scripts: This is achieved simply by a script scheduled to run every hour or so. The script copies, to a remote network computer, all the archive logs generated since the script last ran. This achieves better recoverability than before. However, it does not achieve up to the minute recovery. Further, one has to ensure that the a log switch is performed before the logs are copied, so as to ensure that redo associated with recent transactions (within the last hour) is copied to the remote destination.The log switch can be performed using the “alter system archive log current” command.
Configure the Oracle ARC process to copy logs to the remote destination: This is done by defining a secondary archive destination via one of the LOG_ARCHIVE_DEST_N initialisation parameters. This method is not recommended because it is somewhat fragile – see this discussion on Tom Kyte’s site, for example. Be sure to use a mapped network drive as the archive destination if you choose to go down this path. Oracle 9i will not recognise archive log destinations specified using UNC (Universal Naming Convention).
Finally, any article on disaster recovery should mention Oracle Data Guard – which is Oracle Corporation’s recommended disaster recovery solution for mission critical systems. This is essentially a standby database that is kept synchronised with the primary through the continuous application of redo. There are different levels of synchronisation depending on required availability, performance and (most important!) the acceptable data loss. There are two types of standby databases depending on how redo is applied: 1) Physical standby – which is an identical, block for block, copy of the primary, and 2) Logical standby – which is kept synchronised by applying SQL mined from redo logs. Interested readers are referred to the Oracle documentation on Data Guard for further details, as it is a vast subject, appropriate for a book rather than an article this size. A good starting point is Oracle Data Guard Concepts and Administration guide.
8. Concluding Remarks
This brings me to the end of the three part series on RMAN. I hope the material covered helps you plan your backup and recovery strategy. Remember this: your backup strategy should be dictated by business requirements rather than the latest and greatest technology. It is technically not hard to implement up-to-the-minute recovery, given the appropriate hardware and network bandwidth. However, unless you work for a bank or similar business, your end-users will likely accept some data loss once they hear the costs involved in up-to-the-minute recovery. If your requirements do allow for some data loss, it should be possible to implement a robust recovery strategy using the concepts and procedures discussed in this series of articles.
Reference Links: