Step-by-step guide. Oracle's expdp/impdp commands to exclude/include objects and mask sensitive data (on-premise)

Let's say I had to refresh my development or uat server with production data (all my servers are on-premise).

Why? For easier debugging or developing new features.

It wouldn't be a big deal but:

  • We want it now!

  • We have to exclude/include specified tables.

  • We have to anonymize/change sensitive values like salaries, names etc.

Originally I planned to use a "data pump" feature built in SQLcl - but I couldn't find a way to change data in columns in the exported file.

That's why I chose Oracle's expdp and impdp tools.

My example.

In my case, I'm moving data from PROD schema to the TEST schema.

  • PROD schema has 50 various objects. The TEST schema is empty.

  • I want to export only one table called CUSTOMERS from schema PROD

  • My table PROD.CUSTOMERS have column VALUE. This is sensitive data. I want this value to be changed to 999 for all rows.

  • All objects beside PROD.CUSTOMERS shouldn't be included in my export file

Create a directory to store exported schema.

--use SYS for that
sql sys/*****@localhost:1521/xepdb1 as sysdba

/*select * from all_directories;*/
CREATE OR REPLACE DIRECTORY DATA_PUMP_DIR AS '/opt/oracle/admin/XE/dpdump/your_data';

Add grants for your dedicated user.

I previously created EXP_IMP_USER (user is up to you), so it can export and import data.

grant DATAPUMP_EXP_FULL_DATABASE to EXP_IMP_USER;
grant DATAPUMP_IMP_FULL_DATABASE to EXP_IMP_USER;
GRANT READ, WRITE ON DIRECTORY DATA_PUMP_DIR TO EXP_IMP_USER;

Create stored logic to change data in the VALUE column

I created this package on my EXP_IMP_USER schema (thanks for this article by Ronald)

CREATE OR REPLACE PACKAGE remove_data AUTHID CURRENT_USER
AS
  FUNCTION changed_number(old_number IN NUMBER) RETURN NUMBER;
END remove_data;
/

CREATE OR REPLACE PACKAGE BODY remove_data AS

FUNCTION changed_number(old_number IN NUMBER) RETURN NUMBER
IS
BEGIN
   RETURN 999;
END;

END remove_data;
/

Prepare parfile. Use REMAP_DATA.

Create a parameter file called parfile.txt. It is needed to store all the logic and magic to change sensitive data in the export file.

Inside parfile.txt, add as many lines as you need in the format

<schema_name>.<table_name>.<column_name>:<package_name>.<function_name>

In my case, the content of the file is as follows:

REMAP_DATA=PROD.customers.value:remove_data.changed_number

So, to change data in column PROD.CUSTOMERS.VALUES, I will use my function remove_data.changed_number.

This function will change all numbers in this column to 999.

Execute EXPDP with parameters.

expdp EXP_IMP_USER/*****@localhost:1521/xepdb1 PARFILE=parfile.txt SCHEMAS=PROD INCLUDE=TABLE:\"LIKE \'%CUSTOMERS\'\" DIRECTORY=DATA_PUMP_DIR DUMPFILE=PROD.dmp LOGFILE=expdpPROD.log
  • PARFILE - location of parflile.txt

  • SCHEMAS - PROD schema I want to export

  • INCLUDE or EXCLUDE syntax. If you see errors regarding bad syntax, read those articles - link1, link2. Depending on your OS (I'm using Mac OS), you might have to change it slightly.

  • DIRECTORY -s erver directory, created at the first step. A data export file will be created there.

  • DUMPFILE - export file with all the data I want

  • logfile

After running this command, you will have two files created in your server directory - PROD.dmp and expdpPROD.log

Use IMPDP to import data.

impdp EXP_IMP_USER/*****@localhost:1521/xepdb SCHEMAS=PROD REMAP_SCHEMA=PROD:TEST DIRECTORY=DATA_PUMP_DIR DUMPFILE=PROD.dmp LOGFILE=impdp.log
  • SCHEMAS - PROD schema I want to export ( I could've more schemas in one file, that's why I'm specifying what I want)

  • REMAP_SCHEMA - my target schema is different from the source, so I need to remap it

  • [optional] REMAP_TABLESPACE if your target tablespace is different. Use "OLD_TABLESPACE: NEW_TABLESPACE"

  • DIRECTORY - location of my export file

  • DUMPFILE - name of the file I want to import

  • LOGFILE - name of a log file that will be produced

Useful links:

  • More about expdp, impdp here.

  • Details of parameter file "parfile".

  • More about remap_data and excluding sensitive data by Ronald here.

Sooner or later, I will prepare a version for OCI Cloud servers.

Stay tuned.