Friday 27 May 2011

Last Sunday

Our production databases are backed up with one of two schedules.  Most days our backups are sent offsite and retained for 3 months before being returned.  But on the last Sunday of each month, our backups are archived for a longer period of time.  Therefore the RMAN backups and Veritas catalogs need to be retained for up to 7 years!

Us Linux DBAs love cron for scheduling jobs. These fancy software schedulers with their dependencies and clever logic have never really replaced a really dense, difficult to read crontab file. And that's just the way we like it in order to keep our jobs!

Now, crontab doesn't have a built-in facility for deciding whether a particular day is the 'last' or 'first' of the month (or quarter, or year).  Our current backup regime relies on a separate cron job to create a lock file, earlier in the day, then each individual backup script checks for existence of that file before deciding which tape schedule and retention policy to use.

I thought there was a slightly more elegant way of achieving this.  Here's what I came up with:

We have a simple script called islastsunday which compares the current date with the date of the last Sunday in the month (from the cal command).

#!/bin/bash
[ $(cal | awk '{print $1}' | tr -s '[:blank:]' '\n' | tail -1) = $(date +%d) ]
In our crontab file we have two entries for our backup, as follows:
# Check if day is LAST SUNDAY of Month. If NOT, run the regular backup
30 00 * * * islastsunday || (rman-job.sh level_0_tape ORCL > level_0_tape_ORCL.log 2>&1; rman-job.sh level_0_tape ORCL2 > level_0_tape_ORCL2.log 2>&1)
# Check if day is LAST SUNDAY of Month. If it IS, run the Monthly backup
30 00 * * * islastsunday && (rman-job.sh level_0_tape_LS ORCL > level_0_tape_LS_ORCL.log 2>&1; rman-job.sh level_0_tape_LS ORCL2 > level_0_tape_LS_ORCL2.log 2>&1)
Both schedules run at the same time, but only one of them will actually execute, depending upon the day of the month. The actual backup commands are bracketed, so that they are ALL dependent upon the islastsunday script, and will all run with the same retention policy, even if the backups continue to run after midnight (i.e. in Monday!).

Friday 20 May 2011

Cold Backup RMAN Fun and Games

Here's an RMAN oddity.  If anyone could explain it, I'd be grateful.

We were moving a database to a new server.  Same OS, same version of Oracle (10.2.0.4).  The database is in Archivelog mode, but to keep things easy, we decided to do an offline, consistent backup.

So, we shutdown the database, then mounted it and performed an RMAN backup to a flash_recovery_area on a shared NFS filesystem:

backup device type disk database plus archivelog include current controlfile;
The backup worked fine. So we logged on to the new server, made sure that we could see the NFS-mounted FRA, and issued the Duplicate command.

run {
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
allocate auxiliary channel da1 type disk;
allocate auxiliary channel da2 type disk;
allocate auxiliary channel da3 type disk;
allocate auxiliary channel da4 type disk;
duplicate target database to CATDB;
}
Oddly, it didn't work.  It returned an error saying that a backup could not be found.

Starting restore at 20-MAY-11

released channel: da1
released channel: da2
released channel: da3
released channel: da4
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 20/05/2011 15:25:38
RMAN-03015: error occurred in stored script Memory Script
RMAN-06026: some targets not found - aborting restore
RMAN-06100: no channel to restore a backup or copy of datafile 1
RMAN-06100: no channel to restore a backup or copy of datafile 2
etc...
If we tried using automatic channel allocations for the restore, it still did not find our disk backup, but tried restoring older tape backups.  Not what we wanted.

So, we tried backing it up again.  Backing up just the archivelogs.  Just the database.  Nada!  The same error every time we tried to Duplicate.

In desperation, I tried opening up our original DB and running an online backup.  Luckily the database was small and the backup quick.  I then shut it down and re-mounted to prevent the users from changing anything.

This time, upon trying the Duplicate command (this time with an "until sequence" inserted) it all worked as expected!

So it looks like a wee bug.  If you've got a database in Archivelog mode, but you perform an offline backup, RMAN seems to get confused and have trouble deciding which backups to restore.

We know it's not a general problem with offline backups, as we've duplicated DBs in Noarchivelog mode on this system before.

Monday 28 March 2011

When is DDL not DDL?

If you ever thought that the LAST_DDL_TIME on a table only tells you the last time that DDL was performed directly upon that table, then think again.

We recently noticed some tables which appeared to have been recently changed, yet we couldn't find any related DDL changes.

Then we remembered that we regularly rebuild a handful of indexes (once a week, on Sunday morning) for performance/plan-stability reasons.

Could it be that rebuilding an index resets the LAST_DDL_TIME for the index's table?

Let's find out.

SQL> create table poo (x date);

Table created.

SQL> create index poodex on poo(x);

Index created.

SQL> select object_name,
to_char(last_ddl_time, 'DD-Mon-YYYY HH24:MI:ss')
2 from dba_objects where object_name like 'POO%';

OBJECT_NAME TO_CHAR(LAST_DDL_TIM
----------- --------------------
POO         22-Mar-2011 10:09:51
POODEX      22-Mar-2011 10:09:51

SQL> alter index poodex rebuild online;

Index altered.

SQL> select object_name,
     to_char(last_ddl_time, 'DD-Mon-YYYY HH24:MI:ss')
2 from dba_objects where object_name like 'POO%';

OBJECT_NAME TO_CHAR(LAST_DDL_TIM
----------- --------------------
POO         22-Mar-2011 10:11:31
POODEX      22-Mar-2011 10:11:31

Yup! In a surprising (to me) result, it looks as though rebuilding an index does indeed reset LAST_DDL_TIME on the table, even though we are not performing DDL on the table directly.

In my opinion, this is a misleading result.

Thanks to Matt Penny for doing all the work and investigation and telling me to publish the results!

EDIT:  Thanks to Neil below for suggestions of other events which may or may not cause this behaviour.

I have tested creating, amending, re-compiling and dropping a procedure, but this does not cause a change in LAST_DDL_TIME for the dependent table.

Example as follows:

14:16:55 PETE @ ETEST1D > create table foo (foocol number(5));

Table created.

14:17:20 PETE @ ETEST1D > @lastddl

OBJECT_NAME                    TO_CHAR(LAST_DDL_TIME,'DD-MON-
------------------------------ ------------------------------
FOO                            28-Mar-2011 14:17:20

14:18:35 PETE @ ETEST1D > create procedure fooproc is
14:19:13   2  v_var varchar2(10);
14:19:26   3  begin
14:19:31   4  null;
14:19:38   5  exception
14:19:42   6  when others then null;
14:19:45   7  end;
14:19:47   8  /

Procedure created.

14:19:50 PETE @ ETEST1D > @lastddl

OBJECT_NAME                    TO_CHAR(LAST_DDL_TIME,'DD-MON-
------------------------------ ------------------------------
FOO                            28-Mar-2011 14:17:20
FOOPROC                        28-Mar-2011 14:19:50

14:20:06 PETE @ ETEST1D > create or replace procedure fooproc is
14:20:31   2  v_var varchar2(10);
14:20:34   3  begin
14:20:36   4  select to_char(foocol)
14:21:02   5  into v_var
14:21:07   6  from foo;
14:21:14   7  exception
14:21:17   8  when others then null;
14:21:18   9  end;
14:21:19  10  /

Procedure created.

14:21:20 PETE @ ETEST1D > @lastddl

OBJECT_NAME                    TO_CHAR(LAST_DDL_TIME,'DD-MON-
------------------------------ ------------------------------
FOO                            28-Mar-2011 14:17:20
FOOPROC                        28-Mar-2011 14:21:20

14:21:31 PETE @ ETEST1D > alter procedure fooproc compile;

Procedure altered.

14:22:07 PETE @ ETEST1D > @lastddl

OBJECT_NAME                    TO_CHAR(LAST_DDL_TIME,'DD-MON-
------------------------------ ------------------------------
FOO                            28-Mar-2011 14:17:20
FOOPROC                        28-Mar-2011 14:22:07

14:22:13 PETE @ ETEST1D > drop procedure fooproc;

Procedure dropped.

14:22:40 PETE @ ETEST1D > @lastddl

OBJECT_NAME                    TO_CHAR(LAST_DDL_TIME,'DD-MON-
------------------------------ ------------------------------
FOO                            28-Mar-2011 14:17:20

Despite creating and dropping procedure, the LAST_DDL_TIME of the depdendent table is not changed. Exactly as expected.

Now trying with a comment (which is a DDL command):

14:24:20 PETE @ ETEST1D > comment on table foo is 'FOO is a test table';

Comment created.

14:24:25 PETE @ ETEST1D > @lastddl

OBJECT_NAME                    TO_CHAR(LAST_DDL_TIME,'DD-MON-
------------------------------ ------------------------------
FOO                            28-Mar-2011 14:24:25

Again - as expected.  A DDL command against the table changes the LAST_DDL_TIME.

This makes it all the more surprising that running a DDL command against an index will change LAST_DDL_TIME for its parent table.