Many companies have strict requirements for auditing data changes. Audited data is often used to generate business reports or displayed via a front-end application. In a previous article, I discussed one solution for implementing an audit trail and provided an example of a code generator for this solution. If you haven’t read that article yet, I recommend doing so before continuing with this one.
Let’s review some important rules for implementing an audit solution. Each table should have a unique row identifier, typically a column named “row_id”. If the table already has an identity column, the row_id column can be a computed column with the same value as the identity column. The audit table should store the changed row, not the old row, to improve performance and eliminate the need for joining the audit and audited tables to retrieve old and current values. Each history table should have a modif_id field to group all rows changed in one modification.
In this article, I want to cover two additional requirements or tasks related to auditing. Firstly, the amount of standard columns that support auditing is growing and can vary based on application requirements. In the past, only four pieces of information (user, date, machine, and change type) were required for auditing. However, with the rise of web applications, shared modules, advanced security, etc., additional auditing data such as IP address, URL, application/module/screen code, and user type has become important. Storing this information in columns in each table is not practical due to the varying number of standard columns for auditing in different applications. A new idea is to use a standard XML column to store all the necessary data in XML format.
For example:
<row> <user>12</user> <module>mymodule</module> <ip>123.12.12.12</ip> <url>myurl.com</url> </row>
This XML column can be centralized in a modification table and not duplicated in each audit table, as long as all the pieces of information for supporting the audit are the same within one transaction. However, if this information can be different for each modified row in a transaction, then the XML column should be kept in each audit table.
My second task was to answer the question of which product was changed and who made the changes. This question is often asked by destination databases when the current database is the source and the destination database only needs to keep active records. To answer this question, we don’t need to know the specific pieces of information that were changed, but rather which products were changed. By implementing an audit solution, we can easily answer this question while also providing a full audit trail.
Now let’s take a look at the implementation of this solution using the “product” table as an example:
CREATE TABLE product ( product_id int primary key, product_name varchar(50), rowstatus char(1), row_id int identity(1,1), audit_xml XML, modif_id uniqueidentifier ) CREATE TABLE product_property ( property_id int identity(1,1) primary key, property_desc varchar(50), value varchar(10), rowstatus char(1), row_id as property_id, audit_xml XML, modif_id uniqueidentifier )
In addition to these tables, we also need a general “Modification” table to hold the main record of each modification:
CREATE TABLE Modification ( Modification_ID int IDENTITY(1,1) NOT NULL, Modif_ID uniqueidentifier NOT NULL, Database_NM varchar(30) NOT NULL, Table_NM varchar(100) NOT NULL, Action_CD char(1) NOT NULL, TotNumModifRows int NOT NULL, CreateDate datetime NOT NULL, CreateSource varchar(128) NOT NULL, UserMachine varchar(128) NULL, UserName varchar(128) NULL, Audit_XML xml NOT NULL )
We also need a “RowAudit” table to capture information about rows that were changed in a transaction:
CREATE TABLE RowAudit ( RowAudit_ID int IDENTITY(1,1) NOT NULL, Modif_ID uniqueidentifier NOT NULL, ROW_ID int NOT NULL, Database_NM varchar(30) NOT NULL, Table_NM varchar(100) NOT NULL, ActionCD char(1) NOT NULL, OLD_Status char(1) NOT NULL, New_Status char(1) NOT NULL, CreateDate datetime NOT NULL )
Defining what constitutes a real row change between two points in time based on the “RowAudit” table can be complex. However, in most cases, we only need the start and end points of changes. For example, if a row was inserted, then modified multiple times, and finally deleted, we only need to know that the row was deleted. The “RowAudit” table can be used to determine the necessary actions for each row.
Where should the audit tables be created? They can be created in the same database as the audited tables or in a separate database. It makes sense to create a separate database for audit tables, especially if the audit data can be significant in size. Separating the data in different databases simplifies maintenance, data offloading, and cleanup processes in the future. In my case, I decided to create one database per SQL Server with schemas named after the audited database.
For example, if the audited database is named “products”, the audit database will have all audit tables in the “products” schema:
CREATE TABLE products.a_product ( audit_id int identity(1,1) primary key, product_id int, product_name varchar(50), rowstatus char(1), row_id int, modif_id uniqueidentifier, createdate datetime, actioncd char(1) ) CREATE TABLE products.a_product_property ( audit_id int identity(1,1) primary key, property_id int, property_desc varchar(50), value varchar(10), rowstatus char(1), row_id as property_id, modif_id uniqueidentifier, createdate datetime, actioncd char(1) )
One additional consideration is to create a control table that holds flags and allows triggers to decide what should be recorded. The trigger can use the flag configuration to determine the action to take. Here’s an example of a control table:
CREATE TABLE ControlTable ( ControlTable_ID int identity(1,1) NOT NULL primary key, Table_NM varchar(100) NOT NULL, DataAudit_FLG char(1) NOT NULL, RowAudit_FLG char(1) NOT NULL, HardDelete_FLG char(1) NOT NULL )
When the trigger records changes in the “RowAudit” table, a mechanism is needed to answer the question, “Which product is changed between the last ETL and now?” This mechanism can be part of an ETL process that modifies data in the destination database. The ETL process can be designed to delete all products from the destination database and then reinsert them from the source database.
For a complete example of a trigger for a single table, please refer to the resources section.
In conclusion, this article demonstrates an example of extending audit solutions using XML data fields to accommodate a growing number of standard data elements required for auditing. It also shows how an audit solution can be transformed into a mechanism that answers simple questions without running a full-scale audit. Different ETL architectures may require different solutions, but this article provides one possible approach.