Wednesday 24 October 2012

NOT NULL vs. Check Constraints

I found Jonathan Lewis's excellent article about "incorrectly" defining NOT NULL columns some time ago.  You can read his article at http://jonathanlewis.wordpress.com/2010/09/05/not-null/

Having read it, I did some investigation into our production databases and, lo and behold, found some examples of columns with NOT NULL check constraints but without the NOT NULL column definition.

Here's the query I used to find the 'guilty' columns:
 
select o.owner, o.object_type, o.object_name, t.column_name, co.search_condition, co.validated
 from dba_objects o, sys.col$ c, dba_tab_columns t, dba_cons_columns cc, dba_constraints co
 where o.owner = '&table_owner'
 and o.object_type = 'TABLE'
 and o.object_id = c.obj#
 and o.owner = t.owner
 and o.object_name = t.table_name
 and t.column_name = c.name
 and cc.owner = o.owner
 and cc.table_name = t.table_name
 and cc.column_name = c.name
 and co.owner = cc.owner
 and co.constraint_name = cc.constraint_name
 and co.constraint_type = 'C'
 and c.null$ = 0
order by object_name, column_name; 
Note that in 10g and 11g the DBA_CONSTRAINTS.SEARCH_CONDITION column is stored as a LONG.  Which means that you can't filter out NOT NULL check conditions (e.g. "column_name in (0,1)") from any other check conditions in the query.

Deprecated for at least 15 years, but still used in the data dictionary.  Thanks Oracle!

The alternative is to make your own one-off copy of DBA_CONSTRAINTS, using TO_LOB() to convert SEARCH_CONDITION to a CLOB.  Then the query can be re-written as:
 
select o.owner, o.object_type, o.object_name, t.column_name, co.search_condition, co.validated
 from dba_objects o, sys.col$ c, dba_tab_columns t, dba_cons_columns cc, my_dba_constraints co
 where o.owner = '&table_owner'
 and o.object_type = 'TABLE'
 and o.object_id = c.obj#
 and o.owner = t.owner
 and o.object_name = t.table_name
 and t.column_name = c.name
 and cc.owner = o.owner
 and cc.table_name = t.table_name
 and cc.column_name = c.name
 and co.owner = cc.owner
 and co.constraint_name = cc.constraint_name
 and co.constraint_type = 'C'
 and c.null$ = 0
 and co.search_condition not like '% AND %'
 and co.search_condition not like '% OR %'
 and co.search_condition like '%NOT NULL%'
order by object_name, column_name;

Monday 15 October 2012

Fisher-Yates (Knuth) Shuffle in PL/SQL

We were recently looking for a way to easily anonymize some customer data, specifically bank account numbers.

Part of our testing mechanism is to verify that a bank account and sort-code combination is valid and we retrieve the bank details for that combination (using a commercial account lookup service).  So we couldn't just replace the sort-code and/or account number with random numbers.

Conversely, replacing all account numbers with a handful of known, "safe" working values would lead to skewed data spreads, so that wouldn't work.

A better solution is to shuffle the existing data, moving the sort-code and account number pairs to a different, unrelated client record.

We found a description of the Fisher-Yates Shuffle (one variety of which is also known as the Knuth Shuffle) which seemed like a good way to achieve our aim.  It seems to be used quite widely to do exactly what we wanted, but I couldn't find any examples of implementing the shuffle in PL/SQL.

So, here's our solution:

 
set serveroutput on size unlimited
DECLARE

    jswap NUMBER ;
    jint NUMBER := 17 ;
    lcount NUMBER := 0 ;
    lrowcount NUMBER := 0 ;

    TYPE tab_ba     IS TABLE OF client_bank_accts%ROWTYPE INDEX BY BINARY_INTEGER;
    array_ba        tab_ba ;

    CURSOR r IS
    SELECT *
    FROM client_bank_accts;

    sav_banksortcd client_bank_accts.banksortcd%TYPE;
    sav_bankacctnbr client_bank_accts.bankacctnbr%TYPE;

BEGIN

   OPEN r;
   LOOP

      FETCH r 
      BULK COLLECT 
      INTO array_ba LIMIT 5000;

      lrowcount := array_ba.COUNT;

      FOR i IN REVERSE 1 .. array_ba.COUNT LOOP

         jswap := trunc(dbms_random.value(1, i+1));

         sav_banksortcd := array_ba(i).banksortcd;
         sav_bankacctnbr := array_ba(i).bankacctnbr;

         UPDATE client_bank_accts 
         SET    banksortcd = array_ba(jswap).banksortcd,
                bankacctnbr = array_ba(jswap).bankacctnbr
         WHERE  clientbankacctseq = array_ba(i).clientbankacctseq;

         UPDATE client_bank_accts 
         SET    banksortcd = sav_banksortcd,
                bankacctnbr = sav_bankacctnbr
         WHERE  clientbankacctseq = array_ba(jswap).clientbankacctseq;

         lcount := lcount + 2;

      END LOOP;
      EXIT WHEN r%NOTFOUND;

   COMMIT;
   END LOOP;
   CLOSE r;

   dbms_output.put_line('Completed. '||to_char(lcount)||' rows updated!');

END;
/