Published on

September 9, 2007

Understanding Surrogate-Keys in SQL Server

Surrogate-keys are an essential component of data warehousing in SQL Server. They provide performance advantages and enable the implementation of slowly changing dimensions. In this article, we will explore the concept of surrogate-keys and discuss how to generate them for Type 1 and Type 2 dimensions using SQL Server Integration Services (SSIS).

What are Type 1 and Type 2 dimensions?

In data warehousing, Type 1 dimensions represent the current state of the business without maintaining any history of dimensional changes. On the other hand, Type 2 dimensions track historical changes, providing both the current and past views of the business at any given point in time.

What are Surrogate-Keys?

In a data warehouse, surrogate-keys are generated for all members in the dimensions. These keys replace the source keys from the OLTP system and ensure consistency and reduce the size of the tables. Surrogate-keys also play a crucial role in creating Type 1 and Type 2 slowly changing dimensions.

Generating Surrogate-Keys using SSIS

SQL Server Integration Services (SSIS) provides a framework for efficiently developing data warehousing ETL mechanisms. To generate surrogate-keys, we need to read the highest value of existing surrogate keys from the database and programmatically generate new keys without constant reading and writing to the database.

The process involves three Data Flow tasks: updating attribute changes, inserting new dimension members, and handling early facts. The attribute updating process queries the source data, filters existing dimension members, and updates the dimension table with changes. The insertion of new dimension members involves generating surrogate-keys and inserting them into the dimension table. Handling early facts involves identifying and handling dimension members that are still unknown to the dimension tables.

Implementing Type 2 Dimensions

For Type 2 dimensions, a new record is inserted for attribute changes, retaining the history records instead of updating them. The current record identifier for each existing dimension record where the change took place is changed, and the same dimension member is inserted again with a new surrogate-key along with the changed attributes, creation timestamp, and a current record identifier.

Enriching Type 2 Dimensions

Additional enhancements can be made to Type 2 dimensions, such as storing an expiration date and assigning a change version number to track the number of times a record has been versioned.

Surrogate-Key Pipeline

In the surrogate-key pipeline, all surrogate-key replacements on each fact data row for every conformed dimension occur uninterruptedly. The dimension lookup tables are cached for faster access, and both incoming fact data and lookup tables can be presorted for efficient replacements. Multiple surrogate-key replacement pipelines can run in parallel using conformed dimensions to meet service level agreements.

Conclusion

Surrogate-keys are crucial for data warehousing in SQL Server. They provide performance advantages and enable the implementation of slowly changing dimensions. By using SQL Server Integration Services, we can efficiently generate surrogate-keys for Type 1 and Type 2 dimensions and replace natural keys in the fact data with the appropriate surrogate-keys. SSIS provides a scalable framework for implementing data warehousing procedures that can take advantage of multi-processor servers.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.