Export/Import Validation Gotchas

Export/Import Validation Gotchas

When using Datapump, Original Export/Import or some other logical backup/restore mechanism, we need to validate the migrated databases to ensure all objects have been copied and that they are in the correct state (Valid/Invalid, Enabled/Disabled, etc).

The export and import logs ought to provide enough information for that validation; if we export/import one schema and both export and import have completed without errors then the migration has probably succeeded. However, the log files dont present information in a particularly helpful format – for example if an object is imported but wont compile the log file will contain some pretty ugly looking messages. And even if they are error-free, it can be awkward doing a comparison of an export and import log to identify issues.

So it is normal to present the required information in a standard format using predefined reports based on simple SQL queries running against the data dictionaries of the source and target databases.

But beware: I recently met issues in a couple of migrations that appeared to show discrepencies between the source and target databases. In both cases they turned out to be due to Bugs…

Bug# 9935857 – LOB Indexes are missing from DBA_OBJECTS

Facts:

  • Source and target databases were queried by doing a count per object_type from the DBA_OBJECTS view.
  • Source Database – 11.1
  • Source Database Index count for the migrated schema : 629
  • Target Database  – 11.2
  • Target Database index count for the migrated schema: 648

Root Cause and Workaround:

I noticed straightaway that the difference in the number of indexes was 19, which happened to be how many LOB objects belonged to that schema. I poked around on MOS and found the bug, which was causing LOB indexes to be discounted when querying the DBA_OBJECTS view. The workaround was to count all indexes via the DBA_INDEXES view and to UNION join that to the count of DBA_OBJECTS…

SELECT … FROM dba_objects WHERE object_type <> ‘INDEX’…

UNION

SELECT … FROM dba_indexes…

References: Document 1582869.1 refers

Bug# 5523375 – expdp does not export disabled PK Constraints

Facts:

  • Source Database – 11.1
  • Target Database  – 11.2
  • Two DISABLED PK constraints in a schema in the source database were missing from the target database after a datapump import.
  • No related errors in the export or import log files

Root Cause and Workarounds:

The root cause was found to be bug# 5523375, which is believed to affect releases prior to 11gR2. the bug is on the export side and the documented workaround is be to enable the PK before export, then disable it afterwards. The simpler solution is just to prepare a script to recreate the PK in its DISABLED state following import. Of course, if the PK is DISABLED then it is probably redundant, but in the migration project we cannot make that judgement and it is important not to get dragged into remediating application design issues if that isnt in the remit of our project.

Conclusion

My methodology requires that production migrations are rehearsed wherever possible and these kinds of issues underline the value of doing that – better to have time to research and explain these discrepancies and prepare a workaround following a rehearsal than to try and investigate and troubleshoot it in the middle of a live migration, with all of the stress and time limitations that suggests.

Advertisements

Migration Validation – Object Counts: Scripts, Tips, Gotchas

A key requirement when moving databases is to be able to demonstrate that everything that should have been migrated has indeed been migrated successfully.

Different businesses require different amounts of rigour in terms of the evidence they want to see, depending on their regulatory regime and the nature of their business sector, but a common validation method for Oracle databases is to compare object counts in the source database for the schemas being migrated with the counts in the target system once the migration has completed.

Sounds simple, right? Well it is, but there are a few pit-falls to keep in mind, so I offer some ideas here when using dba_objects as validation evidence for migration projects…

Basic Script

We may start with a simple query to count the number of objects of each object_type, for each schema migrated. It can be helpful to execute this as a distributed query against both source and target databases at the same time to give a consolidated report, but let’s keep it simple and assume we will run this report separately on each database. Assuming we are considering one application schema, called SCHEMA1, let’s check it…

select owner, object_type, count(*) from dba_objects
where owner in ( ‘SCHEMA1’)
group by owner, object_type
order by 1,2
/

OWNER           OBJECT_TYPE               COUNT(*)
————— ———————– ———-
SCHEMA1         DATABASE LINK                    1
SCHEMA1         TABLE                            1

So that’s simple enough – run it on both source and target database and make sure we get the same numbers of the same object types.

Gotcha 1 – Oracle DBA_OBJECTS bug

This is the one I was investigating that got me started on this post; here’s the scenario…

We are migrating a database schema from an Oracle 11gR1 database to an 11gR2 one using datapump. For my current client some strict validation requirements exist and we need to provide various reports as evidence that the migration completed with no issues. We ran the query above to demonstrate that all of the objects in the source database schema had made it to the target database, and got this…

Source…

OBJECT_TYPE           COUNT(*)

——————- ———-

FUNCTION                    36

INDEX                      629

LOB                         19

.

.

.

Target…

OBJECT_TYPE           COUNT(*)

——————- ———-

FUNCTION                    36

INDEX                      648

LOB                         19

.

.

.

Notice the Index count is different in 11gR2 compared to 11gR1 by 19. Turns out this is due to Bug# 9935857. In summary, Lob indexes are missing from DBA_OBJECTS in releases 10.2.0.4, 11.1.0.7, 11.2.0.1 and 11.2.0.2 and fixed in 12.1.0.1 and 11.2.0.3 (and 11.2.0.2 Patch 18 on Windows Platforms).

The documented work-around is to query the view dba_indexes to see the correct number of  LOB INDEX objects, so we have a choice: (i) accept the mis-reporting and reference the Oracle bug# in any auditing that might be required for our migrations, or (ii) Amend the Basic Script to query indexes separately…

select  object_type, count(*) ObjCount

from dba_objects t

where owner=upper(‘SCHEMA1’)

and   object_type <> ‘INDEX’

group by object_type

union

SELECT  ‘INDEX’ object_type, sum(count(*) ) ObjCount

from dba_indexes

where owner =’SCHEMA1′

group by index_type

order by object_type

/

Gives a correct object count on the source database…

OBJECT_TYPE           COUNT(*)

——————- ———-

FUNCTION                    36

INDEX                      648

LOB                         19

.

.

.

Gotcha 2 – Recyclebin

The recyclebin was introduced in Oracle 10 and basically allows tables to be preserved for a time after they are dropped, with the idea that there is a window of opportunity in which they can be recovered easily -“un-dropped” – if they had been dropped in error.

The recyclebin throws up a couple of gotchas if we are querying DBA_OBJECTS as part of our migration validation checks.

The simple work-around is just to purge the recycle bin immediately before the migration, but we can find ourselves unable to do that – for example, we in the Migration Team may not have sufficient privileges to do that, or even where the recyclebin is purged we may find that a user or a process could drop further objects into it before we get the migration underway, so better to explicitly make sure we exclude recyclebin from our object counts.

There are two small traps to watch out for with respect to the recyclebin…

The first point is that objects in the recyclebin will be shown dba_objects in Oracle 10g, but not in 11g and 12c. At time of writing I’m really struggling to find some proper documentary evidence of this difference, but it means we need to exclude objects in the recyclebin from our object counts so they are not double-counted.

The second point is that objects not in the recyclebin can still have system-generated recyclebin names. When a table goes into the recyclebin, all Oracle is really doing is renaming it to a system-generated name prefixed with ‘BIN$’, for example ‘BIN$MeVb+SyJS9i/iyaDUUdZKw==$0’. As well as the table, all of its dependent objects (Triggers, indexes and constraints) also get renamed. When a table is restored it reverts to its original name. However, the dependent objects retain their recyclebin names, so we can see “live” objects using recyclebin names. Note 433768.1 refers

So the conclusion is we need to make sure recyclebin objects are excluded from our object validation checker, but we can’t rely on using object name format to do that, so we must reference the dba_recyclebin view instead.

select  object_type, count(*) ObjCount

from dba_objects o

where owner=upper(‘SCHEMA1’)

and   object_type <> ‘INDEX’

and not exists (

select 1 from dba_recyclebin r

where o.owner = r.owner

and   o.object_name =r.object_name)

group by object_type

union

SELECT  ‘INDEX’ object_type, sum(count(*) ) ObjCount

from dba_indexes

where owner =’SCHEMA1′

group by index_type

order by object_type

/

 

Gotcha 3 – System-named objects

The scripts above use counts of each object type as the basis of validating the migration. In most cases that’s probably a sufficiently good check. We could go a step further and get an explicit list of all objects in the schemas being migrated that do not exist in the target database, so we’d like to check by name, for example…

SELECT owner, object_type, object_name FROM dba_objects@sourcedatabase

WHERE owner in (‘SCHEMA1’)

MINUS

SELECT owner, object_type, object_name FROM dba_objects@targetdatabase

WHERE owner in (‘SCHEMA1’)

/

However, this approach only really works where we can guarantee that all objects have been user named and not system named. For example, indexes are often created using system-generated names (e.g. SYS_C009905) and if we are using datapump or original export/import to migrate our data then the system-generated names on the target database will almost certainly be different to those used on the source database. It means our query above may find loads of indexes in our source database which dont exist with the same name in our target database.

On a practical  level, just generating objects counts per user is probably evidence enough. We could go a step further and report number of indexes per table and compare those between our source and target databases, for example, but that doesnt add much value and may generate very large reports for schemas that own hundreds or thousands of tables.

Oracle Migrations – Identifying Dependent Users

Identifying dependent users and roles for a given list of schema accounts in support of database migrations.

The first step in preparing for the migration of an application database is to analyse how that is configured in its current environment. That gives the information needed to plan and configure the target database and to plan the migration.

One question that we need answers to is this: if I have identified one or more schema accounts for an application, which roles, other schema accounts, service accounts and end user accounts use or have dependencies on these accounts?

This helps inform us what other accounts need to be migrated or remediated when we move our application schema(s), so we can plan accordingly.

We would typically start by consulting with the relevant technical support teams and maybe review available documentation and user management systems to see what accounts might relate to the application in question. We would then analyse the database to see what is defined in the data dictionary. But the database wont always tell the whole story, so we would normally go back to the tech support folks for further clarification if we find discrepencies. For example, it is quite common to find apparently redundant accounts and roles and unexpected dependencies that nobody seems to be aware of.

Starting with dependencies…

The scenario might be as follows (note that for clarity I named accounts and roles as SCHEMAn, USERn or ROLEn): We are told that the schema for a particular application is called SCHEMA1. So we query dba_dependencies and maybe get this…

SQL> select owner, name, referenced_name from dba_dependencies where referenced_owner = ‘SCHEMA1’
2  /

OWNER    NAME         REFERENCED_N
——– ———— ————
SCHEMA2  V1           T1

So it looks like one table in the application schema has a dependent object owned by SCHEMA2 . Well, maybe not…

SQL> select owner, name, referenced_name from dba_dependencies where referenced_owner = ‘SCHEMA2’
2  /

OWNER    NAME         REFERENCED_N
——– ———— ————
SCHEMA3  V3           V1

Hmm, so there is an extra level of nesting. This could go on and on and whilst its an ugly way to do things in my opinion, we keep seeing scenarios like this so we need to have an easy way to report on it.

Tree walking comes to the rescue…

1  select owner, name, referenced_name
2  from dba_dependencies
3  start with referenced_owner = ‘SCHEMA1’
4* connect by prior owner = referenced_owner
SQL> /

OWNER    NAME         REFERENCED_N
——– ———— ————
SCHEMA2  V1           T1
SCHEMA3  V3           V1

Now we have the full tree of nested dependencies. Table SCHEMA1.T1 has a dependent view SCHEMA2.V1, which in turn has a dependent view SCHEMA3.V3.

We have identified that schema accounts SCHEMA2 and SCHEMA3 also relate in some way to the application we want to migrate and we need to evaluate how to manage them in the migration, whether they should be migrated with SCHEMA1, removed, remediated, or just left as they are.

So this is an example of where we would go back to the App Support folks to seek clarification about these two additional dependent schemas and what to do about them.

So what about dependent end users, service account users and roles?

We want to know what users and roles have privileges on objects in our application schema accounts, and whilst we can easily identify which users have privileges directly granted on our application schema’s objects, finding those that have been indirectly granted via a series of roles for example can be a bit trickier…

1  select table_name, grantee, privilege
2  from dba_tab_privs where owner = ‘SCHEMA1’
3* order by 1,2
SQL> /

TABLE_NAME      GRANTEE        PRIVILEGE
————— ————– ———————————–
T1              ROLE1          SELECT
T1              SCHEMA2        SELECT
T1              USER1          SELECT

Here we’ve identified one schema, one end user and one role that have privileges on SCHEMA1’s objects. But the privilege granted to that role is suspicious; is the role then granted to something else? What else is not shown by this query? Let’s try tree walking again…

SELECT grantee, granted_role from (
SELECT grantee , null granted_role
FROM  dba_tab_privs
WHERE owner in (‘SCHEMA1’)
union
SELECT grantee , granted_role
from dba_role_privs
)
start with granted_role is null
connect by granted_role = prior grantee
order by 1
SQL> /

GRANTEE        GRANTED_ROLE
————– ——————–
ROLE1
ROLE2          ROLE1
ROLE3          ROLE2
SCHEMA2
SCHEMA4
USER1
USER2          ROLE3
USER3          ROLE3

So there are three roles, two other schema accounts and three end user accounts with direct or indirectly granted privileges on objects in the schema we are interested in. As with the dependencies we saw earlier, we need to evaluate how to manage these in the migration, so we would normally go back to app suport to get agreement on whether these accounts and roles should be migrated with SCHEMA1, removed, remediated, or just left as they are.

Caveats

If any object privilege is granted to PUBLIC then we have no 100% reliable way to determine which end user accounts, service accounts or roles might be making use of that privilege, so in that case we would need to discuss with application support.

Conclusion

So by using tree walking we can develop simple queries to give a comprehensive list of database users with some sort of direct or indirect dependency or privilege on the application schema(s) we’re interested in.

However, these kinds of queries may not always show the whole story and we must always validate our findings through consultation with the relevant application support teams.

Oracle Export/Import: Tips and Tricks (2)

Some thoughts about using named pipes with ssh and rolling export/import to work around space limitations when executing export/import in big migration projects.

In my previous blog I mentioned some tips using Oracle’s orignal exp/imp utility in combination with named pipes to deal with the scenario when we have limited storage available on a server. I showed how we can use original export/import in combination with named pipes to execute dummy exports to give us accurate metrics regarding the likely size of the export dump file and how pushing export through a named pipe to gzip the output inline is a well-known trick that will give us a dump file that is typically about 20% of the size it would otherwise be.

But what if even that is too big for the available storage? Here’s some more ideas…

ssh & named pipes – exporting to a remote server

For Original Export/Import we can combine a named pipe on our database server with an ssh process on a separate server that will collect the output from exp and write it out to that server. This allows us to configure a so-called “Bridge Server” with its own pool of storage that can be reused for multiple migrations through the life of the project. It means we always have enough storage for our dumps whilst avoiding the pain of having to continuously attach and remove storage volumes from our legacy servers, which can be time-consuming, may require outages and may be impossible for some environments.

So, assume you have a DB server DBSRV1, from which we want to execute an export, and a Bridge Server BRSRV2, configured with lots of lovely storage. The way it might work is this…

Step 1: Create a named pipe on the DB Server…

@DBSRV1$ mkfifo /tmp/exp.pipe

Step 2: Start a ssh process on the Bridge Server to cat the output from the pipe and write it out to a dump file…

@BRSRV1$ nohup ssh -q DBSRV1 \”cat /tmp/exp.pipe\” > /u01/app/oracle/MigDumps/myexp.dmp

…so now we have a pipe set up on the DB server and a process on the Bridge Server waiting to read whatever comes out of it and write that to a dump file

Step 3: Start the export on the DB server…

@DBSRV1$ exp un/pw file=/tmp/exp.pipe …

And away it goes. We will be writing across a public LAN, so it may be slow, depending on things like network load, whether there are firewalls in the way or whatever, but in most cases it will be fast enough in my experience.

Rolling Migrations with Filesize

Next tip is to consider utilising the little remembered parameter FILESIZE that is available for both datapump and original export/import, and it allows us to execute what I call a rolling export import.

FILESIZE specifies the maximum size for each dump file that export will create. Export will write only the number of bytes you specify to each dump file. When the amount of data Export must write exceeds the maximum value you specified for FILESIZE, it will get the name of the next export file from the FILE parameter and continue exporting into that file.

So how does this help us? Well, as each file closes we can copy it to the target server, validate it and then delete it from the source server. It means we dont need storage to accommodate the entire dump for an export, only for the files currently being exported to or those being copied to the target environment.

An example using Original Export Import…

@DBSRV1$ exp un/pw filesize=100MB FILES=exp1.dmp,exp2.dmp,exp3.dmp,…,exp10.dmp …

In this case export will write to file exp1.dmp until it is 100MB in size, then it will start writing to exp2.dmp. At this point we can copy the exp1.dmp file to our target server and if required we can start the import at that point. When exp2.dmp is completed we can copy that, validate it, delete it from the source server and continue the import, and so on and so on. We would need to execute the import in interactive mode so that once it has imported from exp1.dmp it will prompt and wait for the next file.

Thus we are starting the import whilst the export is still running, saving time, and we only need enough storage to our source server to accommodate a few files at a time, not the whole dump file set.

Clearly, this is not an elegant solution, its difficult to be sure how many files at a time will exist on the source server and this technique is not easily automated, so it will probably need to a dedicated DBA to manually process it from end to end and I would only recommend to use it exceptional circumstances.

Pipes and filesize combined

One variation of this rolling export/import technique is that with Original Export/Import we can combine it with named pipes and gzip. This will generate multiple files in a gzipped format, so they will be very much faster to copy to the target environment.

So we might do this…

Step 1: Execute a dummy export to accurately size the export dump…

mknod /tmp/exp.pipe p

dd if=/tmp/exp.pipe of=/dev/null bs=1048576 &

exp myschema/mypwd  file=/tmp/exp.pipe

Step 2: Decide how many files to use for the export.

We’d need to rehearse the migration once or twice to get the optimal filesize, but for example, the dummy export may tell us the dump size will be 100GB, we may select a file size of 10GB, so we will generate about 10 files. If they are being compressed inline using named pipes and gzip they will generate gzipped files sized at about 2GB each. A typical file transfer rate using a file transfer accelerator on a 100 Mbps line will transfer those at a rate of maybe 30 GB/hr, so a 2 GB file will take 4 or 5 minutes.

Step 3: Define named pipes and start a background inline gzip process for each of the files we expect to export to

mkfifo /tmp/exp1.pipe

mkfifo /tmp/exp2.pipe

mkfifo /tmp/exp10.pipe

gzip -c < /tmp/exp1.pipe > filename1.gz &

gzip -c < /tmp/exp2.pipe > filename2.gz &

gzip -c < /tmp/exp10.pipe > filename10.gz &

Step 4: Start the export

exp un/pwd file=/tmp/exp1.pipe,/tmp/exp2.pipe…,/tmp/exp10.pipe parfile=exp.par

Step 5: Monitor the export logfile to see when each file is completed

tail -f exp.log

Step 6: When the first dump is written, copy it to the target environment and begin the import.

Oracle Export/Import: Tips and Tricks (1)

Some tips about how to identify and work around limited storage for dump files when working with export/import…

In the big database migration projects I work on we are presented with a wide range of technical options for migrating our databases, but in my experience logical backup/restore – that is Original Export Import and Datapump – is the method of choice for the overwhelming majority of the migrations we need to do because they offer flexibility like you wouldn’t believe compared to other methods.

There are some particular challenges using these utilities: we need some place to write the dump files, which can be a challenge in mature systems where there may be limited storage and limited options to add more, and they can be too slow for some migrations.

In this series of blogs I will present a series of tips and tricks for how to optimise performance and storage utilisation of Exp/Imp and Datapump and I will also illustrate the use of some other features that are particularly useful in these kinds of big DB migration projects. Some of this stuff wil be old news to experienced DBAs, but I hope it will offer some value.

So lets’s start with an old one; how to optimise use of available storage…

Requirement: We are required to migrate a schema from a 9i database in support of a migration. We are not sure if there is sufficient storage available on the old server and we cannot add any more. So we need to figure out if exp/imp is a viable option for migrating this schema.

Step 1: Size the dump file using dba_segments

I usually first look in dba_segments and see how big the table and lob segments are for the schema we’re interested in…

SELECT round(sum(bytes)/1048576 ) MBytes from dba_segmentswhere owner = ‘MYSCHEMA’ and segment_type not like ‘%INDEX%’

We are excluding index related segments since they occupy no significant space within an export dump file; only the metadata to recreate the index segments is written to the dump file.

Of course, segment size is not an accurate predictor of how big the eventual dump file will be since the bytes column in dba_segments reflects blocks that are allocated to a segment, regardless of how much data they might contain, but this query is a quick check and if it gives us a number that will comfortably fit into the space we have available, then we need look no further; we have enough space.

Step 2: Size the dump file using a dummy export

If the schema segment size looks too big based on the segment size check or we need more information, the next step is to execute a dummy export. Here we can make use of named pipes in a unix/linux environment to get a pretty accurate estimate of the likely dump file size.

Named pipes are used provide inter process communication to channel the output of one command in a Unix/Linux environment to another command. Used with Oracle’s original export/import they can provide us with some pretty cool options as we will see.

Based on the example in Oracle Metalink Note 106465.1 performing a dummy export works like this…

  • Create a pipe called exp.pip (syntax may differ depending on platform):

cd /tmp

mknod exp.pipe p

  • Start reading from the exp.pipe using dd, dump the output to the bit bucket (/dev/null), set blocksize to 1K and execute in background:

dd if=/tmp/exp.pipe of=/dev/null bs=1024 &

  • Start the export

exp myschema/mypwd  file=/tmp/exp/pipe …

  • At the end of exp, look for numbers of records written…

Export Terminated successfully without warnings.

5+0 records in

5+0 records out

So this tells us it wrote 5x1K records, so our dump file will be 5 KB in total, but because we are writing to /dev/null we dont consume any actual storage by running this test. Obviously, in a real migration the numbers will be very much larger.

For datapump it’s a little more straightforward since we can use the ESTIMATE_ONLY parameter to achieve the same thing.

As well as confirming if we have enough storage this figure allows us to estimate how long it is likely to take to copy the files to the target environment.

It will also provide a crude metric regarding how long the export will take. Reading data and then throwing data into /dev/null will be faster than if we write to a dump file, so it will not give an accurate export time, but if even the dummy export takes longer than the time we have available, then we already know we will have to rethink our plans for this migration – maybe undertake some baseline testing, tune the export or renegotiate the migration window.

Step 3: Zip the export using named pipes and gzip

If the dummy export suggests the dump is still too big for the available space we can then look at using named pipes again, this time to zip up the file using gzip.

We create a named pipe, redirect its output into gzip and write the output from that into our target file name. Then start the export, using the pipe as the filename…

mkfifo /tmp/exp.pipe

gzip -c < /tmp/exp.pipe > filename.gz &

exp un/pwd file=/tmp/exp.pipe parfile=exp.par

Typically gzip will reduce the dump file to a fifth of its original size. Once export is complete we copy the dump file to the the following commands

mkfifo /tmp/exp.pipe

gzip -dc filename.gz > /tmp/exp.pipe &

imp un/pwd file=/tmp/exp.pipe parfile=exp.par

This is an old trick for which there are many variants, using compress instead of gzip for example, but its still worth its weight in Gold when working with legacy systems.

In the next Blog I will show an example of combining ssh with named pipes to allow exp to write to a dump file on another server and an example of a “rolling export”.