Tuesday 23 December 2008

RMAN Duplicate Gotchas - Part 2: Standby DB Restores and Redo Log Group

In my previous post I discussed identifying which media you would need to use when performing a restore in order to duplicate a database via RMAN.

In this post, I'll discuss a couple of other issues that you may come across; although, admittedly, they are fairly uncommon.

Standby DB Restores

Like a lot of other sites we maintain a physical standby database, managed using Dataguard, ready to use as a failover instance, in case of problems with the main RAC database. This database is the source for all backups, to avoid overloading the production servers. This leads to a small gotcha when performing any restores from the production database to create a duplicate or clone DB.

When backing-up we connect, as expected to the standby database and the RMAN repository catalog, as below:

rman catalog=rman/password@catalog_db target=sys/passwd@standby_db

RMAN then backs up the file contents from the standby database location, as expected, but because the database has the same DBID as the production database, they are marked as having the file names of the production service.

When you come to perform the RMAN duplicate using the standby DB as target will not work correctly. When attempting to create the duplicate database's controlfile RMAN will barf, saying that it cannot use a standby controlfile as source for a duplicate. You therefore have to connect to your production server directly to run the DUPLICATE command:

rman catalog=rman/password@catalog_db target=sys/passwd@prod_db auxiliary=sys/passwd@clone_db

In addition, you will likely need to amend the parameters which are passed to your tape backup software to ensure that it knows where to find the backup files. If, like us, you have separate backup servers for production and other environments, you will need to pass in the relevant policy, backup server and original backup client names which were used to perform the backup:

allocate channel t1 type sbt;
allocate auxiliary channel ta1 type sbt;
allocate channel d1 type disk;
allocate auxiliary channel da1 type disk;
send 'NB_ORA_POLICY=PROD_POLICY,NB_ORA_SERV=prod_bkup_server,
NB_ORA_CLIENT=prod_db_server';

This also means, as seen above, that you will need to allocate channels (and auxiliary channels) manually, rather than relying on any default channgels you may have defined in your clone database environment.

Redo Log Groups

Oracle allows you to create any almost any number of redo log groups on a database and to number them with any numbering scheme you fancy. At our site, the production groups are numbered with a 3-digit scheme, 101, 201, 301, etc, with each group containing multiple log members.

Unfortunately, there is a 'feature' which will crash your DUPLICATE in this particular case. RMAN attempts to re-create the redo log groups from the target system, but the code obviously cannot cope with redo log group numbers which are more than 2-digits wide. We therefore are forced to include the LOGFILE clause with the DUPLICATE command to re-number the groups and allow the command to complete:

duplicate target database to CLONEDB
until scn 36511896776
logfile GROUP 1 ('/u01/ oradata/CLONEDB/redo01_1.dbf',
'/u02/oradata/CLONEDB/redo01_2.dbf') SIZE 500M,
GROUP 2 ('/u01/oradata/CLONEDB/redo02_1.dbf',
'/u02/oradata/CLONEDB/redo02_2.dbf') SIZE 500M,
GROUP 3 ('/u01/oradata/CLONEDB/redo03_1.dbf',
'/u02/oradata/CLONEDB/redo03_2.dbf') SIZE 500M,
GROUP 4 ('/u01/oradata/CLONEDB/redo04_1.dbf',
'/u02/oradata/CLONEDB/redo05_2.dbf') SIZE 500M REUSE;

If you forget to add the LOGFILE clause first time around (as I did), don't despair. The database restore will have already completed. Simply re-run the DUPLICATE command with the additional clause - RMAN knows that the datafiles are now in place and things should pick-up from there.

Monday 1 December 2008

RMAN Duplicate Gotchas - Part 1: Identifying the Media

I've recently been doing quite a few RMAN Duplicate Database jobs, in order to create dev and test systems as copies of our production server.

The production database is a 2-node RAC cluster, but each of our test systems are single-instance databases. This causes one or two odd issues which aren't always obvious until you've hit them at least once.

In this post I'll discuss the first tricky issue which I came across: Identifying the Media.

At my current job (as in most IT departments), each day the most recent backup tapes are removed from the tape robot and sent off-site to a secure storage facility. The company I work for doesn't have the resource to maintain duplexed tapes (i.e. one on-site and one off-site) and, at present, for reasons too complicated to go into here, the production system doesn't have a flash recovery area, so the backups are written to tape, and tape only.

All this means that when a test database refresh is requested, it is critical to identify the set of tapes on which the required backups reside. So, once you've decided to which point of time you wish to restore, it's necessary to identify exactly which tapes you need to recall in order to load into the robot.

1. Log into RMAN (Target and Catalog) and list the backups you're interested in. E.g:

rman catalog=rman/password@catalog_db target

RMAN> list backup;

BS Key Size Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
1063429 1.87G SBT_TAPE 00:03:36 26-NOV-08
BP Key: 1063435 Status: AVAILABLE Compressed: NO Tag: TAG20081126T020905
Handle: m9k0kn62_1_1 Media:

List of Archived Logs in backup set 1063429
Thrd Seq Low SCN Low Time Next SCN Next Time
---- ------- ---------- --------- ---------- ---------
1 58271 36469897611 25-NOV-08 36470958889 25-NOV-08
1 58272 36470958889 25-NOV-08 36472168625 25-NOV-08
1 58273 36472168625 25-NOV-08 36472947809 25-NOV-08
1 58274 36472947809 25-NOV-08 36473847271 25-NOV-08

BS Key Size Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
1063430 1.86G SBT_TAPE 00:03:44 26-NOV-08
BP Key: 1063436 Status: AVAILABLE Compressed: NO Tag: TAG20081126T020905
Handle: mak0kn62_1_1 Media:

List of Archived Logs in backup set 1063430
Thrd Seq Low SCN Low Time Next SCN Next Time
---- ------- ---------- --------- ---------- ---------
2 61976 36471707453 25-NOV-08 36472752913 25-NOV-08
2 61977 36472752913 25-NOV-08 36473892837 25-NOV-08
2 61978 36473892837 25-NOV-08 36475824974 25-NOV-08
2 61979 36475824974 25-NOV-08 36477342218 25-NOV-08

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
1063514 Full 19.00M SBT_TAPE 00:00:30 26-NOV-08
BP Key: 1063521 Status: AVAILABLE Compressed: NO Tag: TAG20081126T021254
Handle: c-2982994484-20081126-01 Media:
Standby Control File Included: Ckp SCN: 36476925621 Ckp time: 25-NOV-08

[It's interesting to note that LIST BACKUP command in RMAN returns a label (in bold above) indicating that a media reference should be displayed, but I have yet to see any information listed here in any recent version of Oracle (9i or 10g on Windows or Linux).]

You can use the COMPLETED AFTER and/or COMPLETED BEFORE clauses to narrow down the list of displayed backup sets. And don't forget to list the backups starting with the most recent full, or level-0, backup, including any incrementals.

Make a note of the backup TAGs that you wish to check. At this point it's probably also sensible to make a note of which SCN you want to restore to. I find it much easier to cross-reference an SCN with the backup report than an exact point-in-time.

2. Now, Log into the RMAN catalog as the RMAN user:

sqlplus rman/password@catalog_db

3. Run the following SQL command. Enter a comma-separated, single-quoted list of tags as identified in step 1 (as many as you like - I've used two here, for simplicity):

select tag, bs_key, bp_key, start_time, media
from bp
where tag in(&list_of_tags.);

Enter value for list_of_tags: 'TAG20081126T020905','TAG20081126T021254'
old 3: where tag in(&list_of_tags.)
new 3: where tag in('TAG20081126T020905','TAG20081126T021254')

TAG BS_KEY BP_KEY START_TIM MEDIA
------------------ ------- ------- --------- ----------
TAG20081126T020905 1063426 1063432 26-NOV-08 000304
TAG20081126T020905 1063427 1063433 26-NOV-08 000310
TAG20081126T020905 1063428 1063434 26-NOV-08 000304
TAG20081126T020905 1063429 1063435 26-NOV-08 000304
TAG20081126T020905 1063430 1063436 26-NOV-08 000304
TAG20081126T021254 1063514 1063521 26-NOV-08 000304

Output is as above. The media column shows the label of the tape that the backup piece is stored on.

Now you know which tapes are going to be required for your restore, you can recall them from the storage company.

Next post I will consider the next "gotcha" on my list.