Reference: |
Change Data Capture allows loading data targets using only the rows that have changed since the last load. This approach is useful when dealing with large databases that would take too much time and resources to reload completely.
The source table used in this example is an inventory table called dminv that is created by running Create Sample Procedures and Data.
In this example, the source table is in DB2 or UDB.
The procedure has the following steps:
The first step is to create a data flow that creates the target table and loads it with data from the current contents of the source table.
Note: Key column(s) are required for loading a table to uniquely identify each row so that Insert/Update/Delete processing can be performed.
To use change data capture, the source database must be configured to use database logging. In most organizations this must be done by a database administrator. To enable the database for logging, see Enabling Database for Logging.
This example uses dminv as the source table. To create this table see How to Create Sample Procedures and Data.
The database table must be enabled for logging and the procedure depends on the database being used.
If using:
The table to be used must have CDC enabled. This can be done from Microsoft SQL Server Management Studio. You must be a member of the db_owner database role to run the stored procedure. Issue the following commands:
exec sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'dminv', @role_name = NULL select is_tracked_by_cdc FROM sys.tables WHERE name = 'dminv'
You will see a 1 if the table is enabled, a 0 otherwise.
The table to be logged must have capture mode enabled. This can be done from the DB2 Command Center or a stored procedure.
Note: For IBM i users from the user id that will run CDC flows start the program strsql and issue the following commands:
CREATE COLLECTION databaseCHGPRF CURLIB(database)
ALTER TABLE dminv DATA CAPTURE CHANGES
The table to be logged must have all data columns logged. This can be done from the ORACLE SQL Developer or a stored procedure.
ALTER TABLE dminv ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
A synonym for a log table allows reading from the database log records for a selected table. The synonym contains the same columns as the table itself plus three additional columns that identify changes to the table.
This synonym can be used as a source in a data flow with the special load type of "IUD Processing." The name is an abbreviation for Insert/Update/Delete.
Note: A synonym for a log table is not a substitute for a synonym for the table itself. A synonym is still needed for the table to be able to read or write from the table directly.
To create a synonym for a Log table, in the DMC:
The Select Adapter window opens.
The Select Synonym Candidates window opens.
Optionally, check Filter by owner/schema checkbox and for object name enter dminv. Click Next.
Under the Default synonym name, click DMINV and change the name to DMINV_LOG, as seen in the following image.
The window now looks like this:
Note that the first three columns for the synonym are added automatically and are used for DataMigrator IUD processing. They have the following functions:
In this step, we will create a DataMigrator flow to copy data from the log table to the target table. Since there is just one source table we will use a Direct Load Flow, but a Data Flow could also be used.
The target properties page opens.
Click the X to close the properties page.
The Transformations window opens.
Click OK to close the window.
To test the change data capture process, make changes to the dminv source table. In this example, three rows are inserted, one row is updated, and one is deleted.
A stored procedure window opens.
SQL db insert into dminv values ('2001','Compact Flash',1000,50,25) ; END SQL db insert into dminv values ('2002','Memory Stick',1000,80,40) ; END SQL db insert into dminv values ('2003','Memory Stick Pro', 1000,200,100) ; END SQL db update dminv set QTY_IN_STOCK=500 where prod_num = '2002' ; END SQL db delete from dminv where prod_num = '2001' ; END SQL db commit; END
Note that the CDC configuration saves a checkpoint of the last transaction processed. If you were to run a Sample Data or Test Transforms using the synonym for the log table, that would set a checkpoint after the rows that are read, so those rows would not be processed by a subsequent DataMigrator flow.
The following image shows a Sample Data report for the log.
Note that the statistics show the number of rows inserted, updated, and deleted.
Close the report.
Note: To return the dminv table back to its original state, enter the following lines in a stored procedure, where db is the engine used, and run them:
SQL db delete from dminv where prod_num > '2001' ; END SQL db commit; END
The final step in setting up Change Data Capture processing is scheduling the flow so that it runs periodically and processes transactions.
A synonym for a log table allows reading from the database log records for a selected table. The synonym contains the same columns as the table itself plus three additional columns that identify changes to the table.
This synonym can be used as a source in a data flow with the special load type of "IUD Processing." The name is an abbreviation for Insert/Update/Delete.
In addition, there are specific requirements for supported databases.
Microsoft SQL Server 2008 supports Change Data Capture in the Enterprise and Developer editions.
Change Data Capture must be enabled at the database level since it is disabled by default. This can be done from the Microsoft SQL Server Management Studio. To enable CDC, you must be connected as a member of the sysadmin server role. Connect to the database you want to enable and issue the following commands:
exec sys.sp_cdc_enable_db select name, is_cdc_enabled from sys.databases
You will see a list of all databases and a 1 if CDC is enabled for the database, a 0 otherwise.
In addition, the SQL Server Agent and two jobs must be running. The jobs are created automatically when the database is enabled. For example, if the database is called "main" the jobs would be named "cdc.main_capture" and "cdc.main_cleanup."
To check if the SQL Server Agent is running, in the Microsoft SQL Server Management Studio in the Object Explorer window, check the icon for SQL Server Agent. It should have a green arrow. If it does not, right-click SQL Server Agent and select Start.
Each table to be used must have CDC enabled. You must be a member of the db_owner database role to run the stored procedure. Issue the following commands, where tablename is the name of the table.
exec sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name =
'tablename', @roll_name = NULL
select is_tracked_by_cdc FROM sys.tables WHERE name = 'tablename'
You will see a 1 if the table is enabled, a 0 otherwise.
To use Change Data Capture the database must be configured to use database logging. In most organizations this must be done by a database administrator. Note that CDC support for UDB and DB2 requires the CLI interface.
Note:iSeries users that are using the user ID that will run the CDC flow should start the program strsql and issue the following commands:
CREATE COLLECTION databaseCHGPRF CURLIB(database)
UPDATE DATABASE CFG FOR database USING DFT_SQLMATHWARN YES UPDATE DATABASE CFG FOR database USING LOGARCHMETH1 LOGRETAIN UPDATE DBM CFG USING DISCOVER DISABLE UPDATE DBM CFG USING DISCOVER_INST DISABLE
On all platforms the user id to run the flows should be an administrator on the DataMigrator server.
Note: For DB2/2400, the user that runs the flows should also be:
GRANT DBADM ON database TO USER userid
The table to be logged must have capture mode enabled. This can be done from DB2 command center or a stored procedure, where tablename is the name of the table.
ALTER TABLE tablename DATA CAPTURE CHANGES
Note that if you recreate a database table, you must re-enable capture mode.
To see which tables have capture mode enabled, you can issue the following DB2 select statement.
SELECT CREATOR, NAME, TYPE FROM SYSIBM.SYSTABLES WHERE DATA_CAPTURE = 'Y'
Note: For z/OS, use DATACAPTURE (without underscore).
To use the Change Data Capture, the database must be configured to use archiving. This can be done from ORACLE SQL Developer or from a stored procedure. In most organizations, this must be done by a database administrator.
ALTER SYSTEM ARCHIVE LOG START;
The date and time formats must be altered so that they can be retrieved properly.
ALTER SESSION SET NLS_DATE_FORMAT='YYYY/MM/DD HH24:MI:SS'; ALTER SESSION SET NLS_TIMESTAMP_FORMAT='YYYY/MM/DD HH24:MI:SSXFF';
To enable logging a user ID must be granted the following where userid represents the user ID to be used.
GRANT DBA TO userid; GRANT EXECUTE ON dbms_logmnr TO userid; GRANT SELECT ANY TRANSACTION TO userid;
The table to be logged must have all data columns logged by issuing the following command where tablename represents the table to be logged.
ALTER TABLE tablename ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
Note that if you recreate a database table, you must re-issue this command.
Note: Change Data Capture for ORACLE uses the DBMS_LOGMNR package to extract values from the ORACLE redo log files. This does not support the following data types: LONG, LONG RAW, CLOB, BLOB, NCLOB, ADT, and COLLECTION.
The CDC Configuration section of the synonym properties which are created for a Table Log.
Note: Each permitter has a default value of a global variable whose name begins with &&CDC_ and is followed by the parameter name. Normally you would just type over or select a value to replace the default with a constant value, then save the synonym.
Alternatively the values could be set dynamically in a process flow in a "Set variables" object.
The following options are available:
This is set to DBMSLOG to indicate the origin of the data is a DBMS Table Log.
Indicates the starting point for reading log records.
Indicates if the last processed transaction should be saved in the checkpoint file.
Indicates the location and name of a file to use to store checkpoint information in a:
The following two parameters, LOG_NAME and LOG_LOCATION, are available on iSeries platforms only, with DB2 database. These parameters can be used when the requirement is to use a non-standard journal and/or a non-current location (for example, alternate library), where the journal and location that are explicitly manually defined by a database administrator (versus a CREATE COLLECTION). One example would be a requirement to use a remote journal, instead of the local one.
Indicates the name of the journal.
Indicates the location name.
Indicates the polling interval in seconds, how often the database log is scanned. Default is 1 second.
Indicates the timeout interval in seconds. If there is no activity in this time interval, the processing will stop. A value of zero means there is no limit. The default value is 1 second.
Note that the two options above work together. For example, if POLLING is 2 and TIMEOUT is 10, the server polls the log every 2 seconds for new transactions. If there are no new transactions after 10 seconds, then polling stops.
Indicates maximum number of database transactions to process before stopping the job. A value of zero (0) means all transactions. The default value is 1 transaction.
Note: For normal Change Data Capture processing, specify a permanent location for CHKPT_FILE so that checkpoint information and set CHKPT_SAVE to YES. Then set START to CHKPT so that processing starts at the last saved checkpoint.
However, note that any operation that reads from the log table including "Sample Data," resets the checkpoint. In that case, a sample data uses the data and that data is no longer available to subsequent runs of the data flow.
iWay Software |