When it comes to the extract, transformation, and load (ETL) process in SQL Server, data cleanup and transformation are crucial steps. In many cases, there are regulatory or company policy requirements that necessitate auditing any modifications made to the data during the ETL pipeline. Even if there are no specific auditing requirements, it is always a good practice to have an audit mechanism in place for any data updates or exclusions during the ETL process.
In this article, we will explore how to include a simple auditing mechanism in SQL Server Integration Services (SSIS). We will demonstrate this by extracting data from a source table, performing a lookup against another table, and updating the data if necessary. We will also create an audit table to track the changes made during the ETL process.
Let’s start by creating our source table:
CREATE TABLE OldCustomerDemographics (
CustomerID INT IDENTITY(1,1),
CustFirstName VARCHAR(50) NOT NULL,
CustLastName VARCHAR(50) NOT NULL,
CustSalesRegion VARCHAR(20) NULL,
CustStatus VARCHAR(20) NOT NULL
)
Next, we need to create the destination table:
CREATE TABLE NewCustomerDemographics (
CustomerID INT,
CustFirstName VARCHAR(50) NOT NULL,
CustLastName VARCHAR(50) NOT NULL,
CustSalesRegion VARCHAR(20) NULL,
CustStatus VARCHAR(20) NOT NULL
)
Now, let’s populate the source table with some sample data:
INSERT INTO OldCustomerDemographics VALUES ('Jim', 'Halpert', 'SOUTHWEST', 'ACTIVE')
INSERT INTO OldCustomerDemographics VALUES ('Michael', 'Scott', 'NORTHEAST', 'ACTIVE')
INSERT INTO OldCustomerDemographics VALUES ('Dwight', 'Schrute', 'WEST', 'ACTIVE')
...
To track the changes made during the ETL process, we will create an audit table:
CREATE TABLE OldCustomerDemographics_AUDIT (
CustomerID INT,
CustFirstName VARCHAR(50) NOT NULL,
CustLastName VARCHAR(50) NOT NULL,
CustSalesRegion VARCHAR(20) NULL,
CustStatus VARCHAR(20) NOT NULL,
AUDIT_NewSalesRegion VARCHAR(20) NULL
)
Now, let’s create a lookup table to standardize the customer’s sales region:
CREATE TABLE NewSalesRegions (
RegionID INT IDENTITY(1,1),
RegionCode VARCHAR(20) NOT NULL
)
INSERT INTO NewSalesRegions VALUES ('NORTH')
INSERT INTO NewSalesRegions VALUES ('NORTHEAST')
INSERT INTO NewSalesRegions VALUES ('NORTHWEST')
...
With the data in place, we can now create the SSIS package to perform the data manipulation. In the Control Flow pane, add a Data Flow Task and attach an OleDB Source to the OldCustomerDemographics table.
Next, add a Lookup Transformation to the data flow. This transformation will match the CustSalesRegion column from the OldCustomerDemographics table with the RegionCode column in the NewSalesRegions lookup table. Rows that are successfully matched will pass through, while unmatched rows will be sent to the Error Output.
For the unmatched rows, we will update them to belong to the NATIONAL sales region and send the old and new values to the audit table. To do this, add a Derived Column transform to specify the new sales region value, and a Multicast transform to send the data to both the audit table and the primary output table.
In the OleDB Destination for the audit table, connect it to the OldCustomerDemographics_AUDIT table and map the new sales region value to the audit column.
Finally, use the UNION ALL transformation to combine the updated rows with the matched rows from the Lookup Transformation. Send the output from the UNION ALL transform to an OleDb Destination, which is connected to the NewCustomerDemographics table.
By executing the SSIS package, you can see the updated rows in the NewCustomerDemographics table and the audit information in the OldCustomerDemographics_AUDIT table.
In conclusion, implementing an auditing mechanism in SQL Server Integration Services is essential for tracking data changes during the ETL process. By following the steps outlined in this article, you can ensure that your data transformations are properly audited and provide a valuable resource for future reference.