Oracle Documentation
Change Data Capture
Change Data Capture efficiently identifies and captures data that has been added to, updated in, or removed from, Oracle relational tables and makes this change data available for use by applications or individuals.
This chapter describes Change Data Capture in the following sections:
See Oracle Database PL/SQL Packages and Types Reference for reference information about the Change Data Capture publish and subscribe PL/SQL packages.
Overview of Change Data Capture
Often, data warehousing involves the extraction and transportation of relational data from one or more production databases into a data warehouse for analysis. Change Data Capture quickly identifies and processes only the data that has changed and makes the change data available for further use.
Capturing Change Data Without Change Data Capture
Prior to the introduction of Change Data Capture, there were a number of ways that users could capture change data, including table differencing and change-value selection.
Table differencing involves transporting a copy of an entire table from the source (production) database to the staging database (where the change data is captured), where an older version of the table already exists. Using the SQL
MINUS
operator, you can obtain the inserted and new versions of updated rows with the following query:SELECT * FROM new_version MINUS SELECT * FROM old_version;
Moreover, you can obtain the deleted rows and old versions of updated rows with the following query:
SELECT * FROM old_version MINUS SELECT * FROM new_version;
However, there are several problems with this method:
- It requires that the new version of the entire table be transported to the staging database, not just the change data, thereby greatly increasing transport costs.
- The computational cost of performing the two
MINUS
operations on the staging database can be very high. - Table differencing cannot capture data that have reverted to their old values. For example, suppose the price of a product changes several times between the old version and the new version of the product's table. If the price in the new version ends up being the same as the old, table differencing cannot detect that the price has fluctuated. Moreover, any intermediate price values between the old and new versions of the product's table cannot be captured using table differencing.
- There is no way to determine which changes were made as part of the same transaction. For example, suppose a sales manager creates a special discount to close a deal. The fact that the creation of the discount and the creation of the sale occurred as part of the same transaction cannot be captured, unless the source database is specifically designed to do so.
Change-value selection involves capturing the data on the source database by selecting the new and changed data from the source tables based on the value of a specific column. For example, suppose the source table has a
LAST_UPDATE_DATE
column. To capture changes, you base your selection from the source table on the LAST_UPDATE_DATE
column value.
However, there are also several limitations with this method:
- The overhead of capturing the change data must be borne on the source database, and you must run potentially expensive queries against the source table on the source database. The need for these queries may force you to add indexes that would otherwise be unneeded. There is no way to offload this overhead to the staging database.
- This method is no better at capturing intermediate values than the table differencing method. If the price in the product's table fluctuates, you will not be able to capture all the intermediate values, or even tell if the price had changed, if the ending value is the same as it was the last time that you captured change data.
- This method is also no better than the table differencing method at capturing which data changes were made together in the same transaction. If you need to capture information concerning which changes occurred together in the same transaction, you must include specific designs for this purpose in your source database.
- The granularity of the change-value column may not be fine enough to uniquely identify the new and changed rows. For example, suppose the following:
- You capture data changes using change-value selection on a date column such as
LAST_UPDATE_DATE
. - The capture happens at a particular instant in time, 14-FEB-2003 17:10:00.
- Additional updates occur to the table during the same second that you performed your capture.
When you next capture data changes, you will select rows with aLAST_UPDATE_DATE
strictly after 14-FEB-2003 17:10:00, and thereby miss the changes that occurred during the remainder of that second.To use change-value selection, you either have to accept that anomaly, add an artificial change-value column with the granularity you need, or lock out changes to the source table during the capture process, thereby further burdening the performance of the source database. - You have to design your source database in advance with this capture mechanism in mind – all tables from which you wish to capture change data must have a change-value column. If you want to build a data warehouse with data sources from legacy systems, those legacy systems may not supply the necessary change-value columns you need.
Change Data Capture does not depend on expensive and cumbersome table differencing or change-value selection mechanisms. Instead, it captures the change data resulting from
INSERT
, UPDATE
, and DELETE
operations made to user tables. The change data is then stored in a relational table called a change table, and the change data is made available to applications or individuals in a controlled way.Capturing Change Data with Change Data Capture
Change Data Capture can capture and publish committed change data in either of the following modes:
- SynchronousTriggers on the source database allow change data to be captured immediately, as each SQL statement that performs a data manipulation language (DML) operation (
INSERT
,UPDATE
, orDELETE
) is made. In this mode, change data is captured as part of the transaction modifying the source table. Synchronous Change Data Capture is available with Oracle Standard Edition and Enterprise Edition. This mode is described in detail in "Synchronous Change Data Capture". - AsynchronousBy taking advantage of the data sent to the redo log files, change data is captured after a SQL statement that performs a DML operation is committed. In this mode, change data is not captured as part of the transaction that is modifying the source table, and therefore has no effect on that transaction.There are three modes of asynchronous Change Data Capture: HotLog, Distributed HotLog, and AutoLog. These modes are described in detail in "Asynchronous Change Data Capture".Asynchronous Change Data Capture is built on, and provides a relational interface to, Oracle Streams. See Oracle Streams Concepts and Administration for information on Oracle Streams.
- CompletenessChange Data Capture can capture all effects of
INSERT
,UPDATE
, andDELETE
operations, including data values before and afterUPDATE
operations. - PerformanceAsynchronous Change Data Capture can be configured to have minimal performance impact on the source database.
- Interface
- CostChange Data Capture reduces overhead cost because it simplifies the extraction of change data from the database and is part of the Oracle Database.
Note that you cannot use any table that uses transparent data encryption as a source table for synchronous Change Data Capture. Asynchronous Change Data Capture supports transparent data encryption if both the source and staging databases have
COMPATIBLE
set to 11 or higher. Change Data Capture will not encrypt the data in the change table. A user who wants to encrypt a column in the change table can manually use an ALTER
TABLE
statement to encrypt the column in the change table. See Oracle Streams Concepts and Administration for information on Oracle Streams.
A Change Data Capture system is based on the interaction of publishers and subscribers to capture and distribute change data, as described in the next section.
Publish and Subscribe Model
Most Change Data Capture systems have one person who captures and publishes change data; this person is the publisher. There can be multiple applications or individuals that access the change data; these applications and individuals are the subscribers. Change Data Capture provides PL/SQL packages to accomplish the publish and subscribe tasks.
The following sections describe the roles of the publisher and subscriber in detail. Subsequent sections describe change sources, more about modes of Change Data Capture, and change tables.
Publisher
The publisher is usually a database administrator (DBA) who creates and maintains the schema objects that make up the Change Data Capture system. For all modes of Change Data Capture, except Distributed HotLog, there is typically one publisher on the staging database. For the Distributed HotLog mode of Change Data Capture there needs to be a publisher defined on the source and staging databases. The following list describes the source and staging databases and the objects of interest to Change Data Capture on each:
-
This is the production database that contains the data of interest. The following objects of interest to Change Data Capture reside on the source database:
- The source tablesThe source tables are the production database tables that contain the data of interest. They may be all or a subset of the source database tables.
- Redo log filesFor asynchronous modes of change data capture, the change data is collected from either the online or archived redo log files (or both). For asynchronous AutoLog mode, archived redo log files are copied from the source database to the staging database.
- Change sourceThe change source is a logical representation of the source database. The method for representing the source varies, depending on the mode of Change Data Capture.For the asynchronous Distributed HotLog mode of Change Database capture only, the change source resides on the source database. For the other modes of Change Data Capture, the change source resides on the staging database.See "Asynchronous Distributed HotLog Mode" for details about the Distributed HotLog change source.
-
This is the database to which the captured change data is applied. Depending on the capture mode that the publisher uses, the staging database can be the same as, or different from, the source database. The following Change Data Capture objects reside on the staging database:
- Change tableA change table is a relational table into which change data for a single source table is loaded. To subscribers, a change table is known as a publication.
- Change setA change set is a set of change data that is guaranteed to be transactionally consistent. It contains one or more change tables.
- Change source
- Synchronous - See "Synchronous Change Data Capture" for details.
- Asynchronous HotLog - See "Asynchronous HotLog Mode" for details.
- Asynchronous AutoLog - See "Asynchronous AutoLog Mode" for details.
These are the main tasks performed by the publisher:
- Allows subscribers to have controlled access to the change data in the change tables by using the SQL
GRANT
andREVOKE
statements to grant and revoke theSELECT
privilege on change tables for users and roles. (Keep in mind, however, that subscribers use views, not change tables directly, to access change data.)
In Figure 16-1, the publisher determines that subscribers are interested in viewing change data from the
HQ
source database. In particular, subscribers are interested in change data from the sh.sales
and sh.promotions
source tables.
The publisher decides to use the asynchronous AutoLog mode of capturing change data. On the
DW
staging database, he creates a change source HQ_SRC
, a change set, SH_SET
, and two change tables: sales_ct
and promo_ct
. The sales_ct
change table contains all the columns from the source table, sh.sales
. For the promo_ct
change table, however, the publisher has decided to exclude the PROMO_COST
column.Subscribers
The subscribers are consumers of the published change data. A subscriber performs the following tasks:
-
- Create subscriptionsA subscription controls access to the change data from one or more source tables of interest within a single change set. A subscription contains one or more subscriber views.A subscriber view is a view that specifies the change data from a specific publication in a subscription. The subscriber is restricted to seeing change data that the publisher has published and has granted the subscriber access to use. See "Subscribing to Change Data"for more information on choosing a method for specifying a subscriber view.
- Notify Change Data Capture when ready to receive a set of change dataA subscription window defines the time range of rows in a publication that the subscriber can currently see in subscriber views. The oldest row in the window is called the low boundary; the newest row in the window is called the high boundary. Each subscription has its own subscription window that applies to all of its subscriber views.
- Notify Change Data Capture when finished with a set of change data
A subscriber has the privileges of the user account under which the subscriber is running, plus any additional privileges that have been granted to the subscriber.
In Figure 16-2, the subscriber is interested in a subset of columns that the publisher (in Figure 16-1) has published. Note that thepublications shown in Figure 16-2, are represented as change tables in Figure 16-1; this reflects the different terminology used by subscribers and publishers, respectively.
The subscriber creates a subscription,
sales_promos_list
and two subscriber views (spl_sales
and spl_promos
) on the SH_SET
change set on the DW staging database. Within each subscriber view, the subscriber includes a subset of the columns that were made available by the publisher. Note that because the publisher did not create a change table that includes the PROMO_COST
column, there is no way for the subscriber to view change data for that column. The subscriber need not be aware of the mode of change data capture employed by the publisher.- Guarantees that each subscriber sees all the changes
- Keeps track of multiple subscribers and gives each subscriber shared access to change data
- Handles all the storage management by automatically removing data from change tables when it is no longer required by any of the subscribers. Keep in mind that Change Data Capture starts a job in the job queue that runs once every 24 hours for handling purging. Many things can go wrong with this job (such as if it is deleted or the schedule is changed), so this automatic processing depends on the job queue process being up and running and the Change Data Capture job being there. Also, in logical standby environments, the purge job is not submitted.Note:Oracle provides the previously listed benefits only when the subscriber accesses change data through a subscriber view.
Change Sources and Modes of Change Data Capture
Change Data Capture provides synchronous and asynchronous modes for capturing change data. The following sections summarize how each mode of Change Data Capture is performed, and the change source associated with each mode of Change Data Capture.
Synchronous Change Data Capture
The synchronous mode uses triggers on the source database to capture change data. It has no latency because the change data is captured continuously and in real time on the source database. The change tables are populated when DML operations on the source table are committed.
There is a single, predefined synchronous change source,
SYNC_SOURCE
, that represents the source database. This is the only synchronous change source. It cannot be altered or dropped.
While the synchronous mode of Change Data Capture adds overhead to the source database at capture time, this mode can reduce costs (as compared to attempting to extract change data using table differencing or change-value section) by simplifying the extraction of change data.
Change tables for this mode of Change Data Capture must reside locally in the source database.
Figure 16-3 illustrates the synchronous configuration. Triggers executed after DML operations occur on the source tables populate the change tables in the change sets within the
SYNC_SOURCE
change source.Asynchronous Change Data Capture
The asynchronous modes capture change data from the database redo log files after changes have been committed to the source database.
The asynchronous modes of Change Data Capture are dependent on the level of supplemental logging enabled at the source database. Supplemental logging adds redo logging overhead at the source database, so it must be carefully balanced with the needs of the applications or individuals using Change Data Capture. See "Asynchronous Change Data Capture and Supplemental Logging" for information on supplemental logging.
The three modes of capturing change data are described in the following sections:
Asynchronous HotLog Mode
In the asynchronous HotLog mode, change data is captured from the online redo log file on the source database. There is a brief latency between the act of committing source table transactions and the arrival of change data.
There is a single, predefined HotLog change source,
HOTLOG_SOURCE
, that represents the current online redo log files of the source database. This is the only HotLog change source. It cannot be altered or dropped.
Change tables for this mode of Change Data Capture must reside locally in the source database.
Figure 16-4 illustrates the asynchronous HotLog configuration. The Logwriter Process (
LGWR
) records committed transactions in the online redo log files on the source database. Change Data Capture uses Oracle Streams processes to automatically populate the change tables in the change sets within the HOTLOG_SOURCE
change source as newly committed transactions arrive.Asynchronous Distributed HotLog Mode
In the asynchronous Distributed HotLog mode, change data is captured from the online redo log file on the source database.
There is no predefined Distributed HotLog change source. Unlike other modes of Change Data Capture, the Distributed HotLog mode splits change data capture activities and objects across the source and staging database. Change sources are defined on the source database by the staging database publisher.
A Distributed HotLog change source represents the current online redo log files of the source database. However, staging database publishers can define multiple Distributed HotLog change sources, each of which contains change sets on a different staging database. The source and staging database can be on different hardware platforms and be running different operating systems, however some restrictions apply. See "Summary of Supported Distributed HotLog Configurations and Restrictions" for information on these restrictions.
Figure 16-5 illustrates the asynchronous Distributed HotLog configuration. The change source on the source database captures change data from the online redo log files and uses Streams to propagate it to the change set on the staging database. The change set on the staging database populates the change tables within the change set.
There are two publishers required for this mode of Change Data Capture, one on the source database and one on the staging database. The source database publisher defines a database link on the source database to connect to the staging database as the staging database publisher. The staging database publisher defines a database link on the staging database to connect to the source database on the source database publisher. All publishing operations are performed by the staging database publisher. See "Performing Asynchronous Distributed HotLog Publishing" for details.
Asynchronous AutoLog Mode
In the asynchronous AutoLog mode, change data is captured from a set of redo log files managed by redo transport services. Redo transport services control the automated transfer of redo log files from the source database to the staging database. Using database initialization parameters (described in "Initialization Parameters for Asynchronous AutoLog Publishing"), the publisher configures redo transport services to copy the redo log files from the source database system to the staging database system and to automatically register the redo log files. Asynchronous AutoLog mode can obtain change data from either the source database online redo log or from source database archived redo logs. These options are known as asynchronous AutoLog online and asynchronous AutoLog archive.With the AutoLog online option, redo transport services is set up to copy redo data from the online redo log at the source database to the standby redo log at the staging database. Change sets are populated after individual source database transactions commit. There can only be one AutoLog online change source on a given staging database and it can contain only one change set.With the AutoLog archive option, redo transport services is set up to copy archived redo logs from the source database to the staging database. Change sets are populated as new archived redo log files arrive on the staging database. The degree of latency depends on the frequency of redo log file switches on the source database. The AutoLog archive option has a higher degree of latency than the AutoLog online option, but there can be as many AutoLog archive change sources as desired on a given staging database.
There is no predefined AutoLog change source. The publisher provides information about the source database to create an AutoLog change source. See "Performing Asynchronous AutoLog Publishing" for details.
Figure 16-6 shows a Change Data Capture asynchronous AutoLog online configuration in which the LGWR process on the source database copies redo data to both the online redo log file on the source database and to the standby redo log files on the staging database as specified by the
LOG_ARCHIVE_DEST_2
parameter. (Although the image presents this parameter as LOG_ARCHIVE_DEST_2
, the integer value can be any value between 1 and 10.)
Note that the LGWR process uses Oracle Net to send redo data over the network to the remote file server (RFS) process. Transmitting redo data to a remote destination requires uninterrupted connectivity through Oracle Net.
On the staging database, the RFS process writes the redo data to the standby redo log files. Then, Change Data Capture uses Oracle Streams downstream capture to populate the change tables in the change sets within the AutoLog change source.
The source database and the staging database must be running on the same hardware, operating system, and Oracle version.
Figure 16-7 shows a typical Change Data Capture asynchronous AutoLog archive configuration in which, when the redo log file switches on the source database, archiver processes archive the redo log file on the source database to the destination specified by the
LOG_ARCHIVE_DEST_1
parameter and copy the redo log file to the staging database as specified by the LOG_ARCHIVE_DEST_2
parameter. (Although the image presents these parameters as LOG_ARCHIVE_DEST_1
and LOG_ARCHIVE_DEST_2
, the integer value in these parameter strings can be any value between 1 and 10.)
Note that the archiver processes use Oracle Net to send redo data over the network to the remote file server (RFS) process. Transmitting redo log files to a remote destination requires uninterrupted connectivity through Oracle Net.
On the staging database, the RFS process writes the redo data to the copied log files. Then, Change Data Capture uses Oracle Streams downstream capture to populate the change tables in the change sets within the AutoLog change source.
See Oracle Data Guard Concepts and Administration for more information regarding Redo Transport Services.
Change Sets
A change set is a logical grouping of change data that is guaranteed to be transactionally consistent and that can be managed as a unit. A change set is a member of one (and only one) change source.
Note:
Change Data Capture change sources can contain one or more change sets with the following restrictions:- All of the change sets for a Distributed HotLog change source must be on the same staging database
- An AutoLog online change source can only contain one change set
When a publisher includes two or more change tables in the same change set, subscribers can perform join operations across the tables represented within the change set and be assured of transactional consistency.
Conceptually, a change set shares the same mode as its change source. For example, an AutoLog change set is a change set contained in an AutoLog change source. Publishers define change sets using the
DBMS_CDC_PUBLISH.CREATE_CHANGE_SET
package. In the case of synchronous Change Data Capture, the publisher can also use a predefined change set, SYNC_SET
. The SYNC_SET
change set, however, cannot be altered or dropped.
To keep the change tables in the change set from growing larger indefinitely, publishers can purge unneeded change data from change tables at the change set level. See "Purging Change Tables of Unneeded Data" for more information on purging change data.
Valid Combinations of Change Sources and Change Sets
Table 16-1 summarizes the valid combinations of change sources and change sets and indicates whether each is predefined or publisher-defined. In addition, it indicates whether the source database represented by the change source is local to or remote from the staging database, and whether the change source is used for synchronous or asynchronous Change Data Capture.
Mode | Change Source | Source Database Represented | Associated Change Sets |
---|---|---|---|
Synchronous
|
Predefined
SYNC_SOURCE |
Local
|
Predefined
SYNC_SET and publisher-defined |
Asynchronous HotLog
|
Predefined
HOTLOG_SOURCE |
Local
|
Publisher-defined
|
Asynchronous Distributed HotLog
|
Publisher-defined
|
Remote
|
Publisher-defined. Change sets must all be on the same staging database
|
Asynchronous AutoLog online
|
Publisher-defined
|
Remote
|
Publisher-defined. There can only be one change set in an AutoLog online change source
|
Asynchronous AutoLog archive
|
Publisher-defined
|
Remote
|
Publisher-defined
|
Change Tables
A given change table contains the change data resulting from DML operations performed on a given source table. A change table consists of two things: the change data itself, which is stored in a database table; and the system metadata necessary to maintain the change table, which includes control columns.
The publisher specifies the source columns that are to be included in the change table. Typically, for a change table to contain useful data, the publisher needs to include the primary key column in the change table along with any other columns of interest to subscribers. For example, suppose subscribers are interested in changes that occur to the
UNIT_COST
and the UNIT_PRICE
columns in the sh.costs
table. If the publisher does not include the PROD_ID
column in the change table, subscribers will know only that the unit cost and unit price of some products have changed, but will be unable to determine for which products these changes have occurred.
There are optional and required control columns. The required control columns are always included in a change table; the optional ones are included if specified by the publisher when creating the change table. Control columns are managed by Change Data Capture. See"Understanding Change Table Control Columns" and "Understanding TARGET_COLMAP$ and SOURCE_COLMAP$ Values" for detailed information on control columns.
Getting Information About the Change Data Capture Environment
Information about the Change Data Capture environment is provided in the static data dictionary views described in Table 16-2 and Table 16-3. Table 16-2 lists the views that are intended for use by publishers; the user must have the
SELECT_CATALOG_ROLE
privilege to access the views listed in this table. Table 16-3 lists the views that are intended for use by subscribers. Table 16-3 includes views with the prefixesALL
and USER
. These prefixes have the following general meanings:- A view with the
ALL
prefix allows the user to display all the information accessible to the user, including information from the current user's schema as well as information from objects in other schemas, if the current user has access to those objects by way of grants of privileges or roles. - A view with the
USER
prefix allows the user to display all the information from the schema of the user issuing the query without the use of additional special privileges or roles.
View Name | Description |
---|---|
ALL_CHANGE_SOURCES |
Describes existing change sources.
|
ALL_CHANGE_PROPAGATIONS |
Describes the Oracle Streams propagation associated with a given Distributed HotLog change source on the source database. This view is populated on the source database for 11.1 change sources or on the staging database for 9.2, 10.1 or 10.2 change sources.
|
ALL_CHANGE_PROPAGATION_SETS |
Describes the Oracle Streams propagation associated with a given Distributed HotLog change set on the staging database. This view is populated on the source database for 11.1 change sources or on the staging database for 9.2, 10.1 or 10.2 change sources.
|
ALL_CHANGE_SETS |
Describes existing change sets.
|
ALL_CHANGE_TABLES |
Describes existing change tables.
|
DBA_SOURCE_TABLES |
Describes all published source tables in the database.
|
DBA_PUBLISHED_COLUMNS |
Describes all published columns of source tables in the database.
|
DBA_SUBSCRIPTIONS |
Describes all subscriptions.
|
DBA_SUBSCRIBED_TABLES |
Describes all source tables to which any subscriber has subscribed.
|
DBA_SUBSCRIBED_COLUMNS |
Describes the columns of source tables to which any subscriber has subscribed.
|
View Name | Description |
---|---|
ALL_SOURCE_TABLES |
Describes all public source tables for change tables that are owned by the current user.
|
USER_SOURCE_TABLES |
Describes all public source tables for change tables that are owned by the current user.
|
ALL_PUBLISHED_COLUMNS |
Describes all published columns of source tables for change tables that are owned by the current user.
|
USER_PUBLISHED_COLUMNS |
Describes all published columns of source tables for change tables that are owned by the current user.
|
ALL_SUBSCRIPTIONS |
Describes all of the subscriptions created by the current user.
|
USER_SUBSCRIPTIONS |
Describes all of the subscriptions created by the current user.
|
ALL_SUBSCRIBED_TABLES |
Describes the source tables to which the current user has subscribed.
|
USER_SUBSCRIBED_TABLES |
Describes the source tables to which the current user has subscribed.
|
ALL_SUBSCRIBED_COLUMNS |
Describes the columns of source tables to which the current user has subscribed.
|
USER_SUBSCRIBED_COLUMNS |
Describes the columns of source tables to which the current user has subscribed.
|
See Oracle Database Reference for complete information about these views.
Preparing to Publish Change Data
This section describes the tasks the publisher should perform before starting to publish change data, information on creating publishers, information on selecting a mode in which to capture change data, instructions on setting up database links required for the asynchronous Distributed HotLog mode of Change Data Capture, and instructions on setting database initialization parameters required by Change Data Capture.
- Gather requirements from the subscribers.
- Determine which source database contains the relevant source tables.
- Choose the capture mode: synchronous, asynchronous HotLog, asynchronous Distributed HotLog, or asynchronous AutoLog, as described in "Determining the Mode in Which to Capture Data".
- Ensure that the source and staging database DBAs have set database initialization parameters, as described in "Setting Initialization Parameters for Change Data Capture Publishing" and "Publishing Change Data".
- Sets up database links from the source database to the staging database and from the staging database to the source database, as shown in "Performing Asynchronous Distributed HotLog Publishing". Be aware that this requires the source database publisher to know the username and password of the staging database publisher and the staging database publisher to know the username and password of the source database publisher.