Published on

September 1, 2007

Understanding SQL Server Data Extraction Methods

When populating a data warehouse, it is essential to extract data from the source systems. In this article, we will focus on extracting data from RDBMS source systems, specifically tables located on SQL Server, Oracle, Informix, DB2, Access, and other similar platforms. There are three main methods for extracting a table from an RDBMS source system into a data warehouse: getting the whole table every time, getting it incrementally (only the changes), or using a fixed period. Let’s explore each of these methods in detail.

1. Whole Table Every Time

The simplest way to download a table from the source system into the data warehouse is to extract all records every time. For smaller tables with less than 1,000 rows, it is efficient to retrieve the entire table in one go. However, for larger tables with millions of rows, it is more practical to extract the data incrementally to reduce the extraction time.

2. Incremental Extract

When dealing with large tables, an incremental extract approach can significantly reduce the extraction time. This method involves extracting only the changes made to the table since the last successful extract. There are several ways to perform incremental extracts:

  1. Using timestamp columns: If the table has creation and update timestamp columns, it is possible to extract the data based on these timestamps. By comparing the timestamps between the source system and the data warehouse, we can identify new, updated, and deleted records.
  2. Using identity column: If the table does not have reliable timestamp columns, or if the timestamps are not indexed, an alternative approach is to use an identity column. The identity column provides a unique identifier for each record, allowing us to identify new records and track changes.
  3. Using triggers: Triggers can be implemented on the source system table to capture and extract the changes made to the table. By creating triggers for insert, update, and delete operations, we can programmatically retrieve the modified records.

3. Fixed Period Extract

In some cases, it may be necessary to extract data from the source system within a fixed period. This method involves specifying a time window for the extraction, such as extracting data for the past month or year. By setting a fixed period, we can ensure that only the relevant data is extracted and loaded into the data warehouse.

Choosing the appropriate data extraction method depends on various factors, including the size of the table, the availability of timestamp columns, and the practicality of implementing triggers. It is important to consider the time window for data extraction, as well as the impact on the source system’s performance.

By understanding these data extraction methods, you can effectively populate your data warehouse and ensure that it contains accurate and up-to-date information. Stay tuned for Part 2 of this article, where we will discuss more advanced techniques for incremental extraction and explore off-the-shelf tools available for SQL Server.

Thank you for reading!

Author: Your Name

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.