Change Data Capture (CDC) is a useful tool in SQL Server that allows you to capture a change history for specific data. This can be particularly important when it comes to compliance with regulations such as the General Data Protection Regulation (GDPR). In this article, we will explore the concept of CDC and how to set it up in SQL Server.
Setting up CDC
Setting up CDC in SQL Server is a straightforward process. However, it is important to note that CDC is only available in Enterprise Edition or SQL Server 2016 SP1 or later.
First, you need to create a table that contains the data you want to track:
IF OBJECT_ID('dbo.CDC') IS NOT NULL
DROP TABLE dbo.CDC;
CREATE TABLE dbo.CDC (
Id INT IDENTITY(1, 1) CONSTRAINT PK_CDC PRIMARY KEY CLUSTERED,
SomeText VARCHAR(1000),
ModifiedBy VARCHAR(128),
ModifiedDate DATETIME
);
Next, enable CDC for the database:
EXEC sys.sp_cdc_enable_db;
Finally, tell SQL Server to track changes for the specified table:
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'CDC',
@role_name = NULL;
Once these steps are completed, CDC is up and running. The change table can be found under system tables in your database, in the CDC schema, and has the name of the original table with “_CT” appended.
Working with CDC
The change table generated by CDC contains additional columns that provide useful information. The “_$operation” column indicates the type of operation that was logged (delete, insert, update), while the “_$update_mask” column shows which columns have changed in an update operation.
Here is an example of inserting and updating data and retrieving the changes from the change table:
INSERT INTO dbo.CDC (SomeText, ModifiedBy, ModifiedDate)
SELECT 'SomeText', 'Matt', GETDATE();
UPDATE dbo.CDC
SET SomeText = 'ANewValue', ModifiedBy = 'StillMatt', ModifiedDate = GETDATE();
SELECT * FROM cdc.dbo_CDC_CT;
By examining the output, you can see the rows representing the initial insert, the values before the update, and the values after the update.
How CDC Works
CDC works by parsing the transaction log of the database. It looks for changes related to the tables it is enabled for and writes those changes to the change tracking table. This process is asynchronous, meaning it does not delay the DML operations. However, it does require additional work to process the transaction log, which can impact performance.
It is important to consider the impact of CDC on log activity and size. CDC increases read activity on the I/O subsystem where the log is stored and can result in larger log files. Monitoring disk activity and log size is recommended when implementing CDC.
Managing CDC Jobs
CDC is managed by SQL Agent jobs. There are two jobs for each database with CDC enabled: the capture job and the cleanup job.
The capture job is responsible for capturing changes and has several parameters that control its behavior. These parameters can be modified using the sys.sp_cdc_change_job stored procedure.
The cleanup job manages the retention of change data. By default, it runs every morning at 2AM, but the schedule can be modified. The retention period and the maximum number of rows deleted in a single cleanup statement can be configured using the same stored procedure.
To check the current parameters for either job, you can query the system table msdb.dbo.cdc_jobs.
Understanding CDC and its configuration options can provide you with a powerful tool for tracking changes in your SQL Server database. By implementing CDC, you can ensure compliance with regulations and have a comprehensive change history for your data.