Published on

March 1, 2011

Understanding Data Warehouses in SQL Server

As data in your applications grows, the database often becomes a bottleneck. Scaling a relational database can be challenging, especially for large-scale applications. One approach to address this issue is to create separate databases for writes and reads, known as transactional and reporting databases. However, sometimes the tools and techniques used to create snapshots of the transactional database for reporting purposes may not meet the requirements of an enterprise.

This is where the need for a data warehouse or an OLAP (Online Analytical Processing) system arises. It’s important to note that a data warehouse is mostly a relational database, built on the same concepts like tables, rows, columns, primary keys, and foreign keys. Before discussing how data warehouses are typically structured, let’s understand the key components that create a data flow between OLTP (Online Transaction Processing) systems and OLAP systems.

There are three major areas to consider:

  1. Change Data Capture: The OLTP system should be capable of tracking its changes and sending them to the data warehouse for historical recording. This process, known as Change Data Capture, ensures that changes made in the OLTP system are reflected in the data warehouse for reporting purposes. SQL Server 2008 offers features like Change Data Capture and Change Tracking to address these requirements.
  2. Data Integration: A batch process needs to be provisioned to periodically take the changes from the OLTP system and dump them into the data warehouse. SQL Server Integration Services is one of the tools that can help fill this gap.
  3. Structural Changes: The data warehouse needs to record these changes in a way that enables historical data analysis. This is often covered under something called Slowly Changing Dimension (SCD). SCD allows a database structure to capture historical data by creating multiple records for a given entity. Data warehouses prefer using their own primary key, known as a surrogate key.

While a data warehouse is just a relational database, the industry often attributes a specific schema style to data warehouses. The two common schema styles are Star Schema and Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to a normalized one) that is easy to understand, query, and analyze.

In a Star Schema, the database structure is made up of dimensions and facts. Facts are the numbers (e.g., sales, quantity) that you want to analyze. Fact tables contain these numbers and have references (foreign keys) to a set of tables that provide context around those facts. These context-providing tables are called dimensions. For example, if you have recorded $10,000 as sales, that number would go in a sales fact table and could have foreign keys attached to it that refer to the sales agent responsible for the sale and the time period in which the sale was made.

A Snowflake Schema is similar to a Star Schema, but with the difference that dimension tables are normalized. This means that the dimension tables are further broken down into sub-tables, creating a more complex structure.

The relational structure of facts and dimensions serves as an input for another analysis structure called a Cube. A Cube is a multidimensional structure where dimensions define the sides of the cube and facts define the content. Measures, which are often referred to as facts inside a cube, are rolled up as per dimensional hierarchy. These rolled-up measures are called aggregates.

Other terms frequently encountered in the context of data warehouses include:

  • Operational Data Store (ODS): An ODS is a database that focuses on reporting the most current data. It syncs with the OLTP system every few minutes and uses the same star schema and OLAP cubes as the data warehouse. However, the data inside an ODS is short-lived, typically for a few months.
  • Data Marts: Data marts are subject-specific data warehouses. They are scaled-down versions of data warehouses that support specific segments of the business, such as sales, marketing, or support. Data marts are often designed using the star schema model.

There is some debate in the industry regarding the use of data marts alongside a central data warehouse. Some experts prefer having data marts connected to a central data warehouse, acting as information staging and distribution hubs. Others argue that a centralized data warehouse is not necessary, as most users want to report on detailed data.

Understanding the concepts and structures of data warehouses is essential for building scalable and efficient systems. By leveraging the power of SQL Server and its features like Change Data Capture, Integration Services, and different schema styles, organizations can create robust data warehouses that meet their reporting and analytical needs.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.