With the rise of Big Data and various cloud offerings, the question arises: is the traditional Data Warehouse (DW) even needed? Some argue that newer technologies like Data Lakes, Hadoop, and Python can replace the traditional DW. However, the traditional, multi-dimensional DW still offers unmatched data analysis capabilities.
In this article, we will explore the concept of a Kimball DW or data mart, which involves developing fact and dimension tables from one or more sources. While the final product may be familiar, the method is not. We will focus on creating a clean dimension table and a narrow fact table that displays values from the dimension tables.
One of the key challenges in modernizing the traditional DW is to speed up development and lower costs. The traditional approach of taking 90 days to produce a working data mart is no longer feasible in today’s fast-paced environment. We need to shorten this timeline to 90 minutes or less. Additionally, the cost of building and maintaining a data warehouse must be significantly reduced.
Here are some key considerations and best practices for modernizing the traditional DW:
- Simplify Dimensions: While dimensions are the heart of the DW, spending excessive time on perfecting them is not necessary. Instead of engaging in lengthy discussions with various departments, make a choice and move on. For example, using abbreviations like “Rd.” instead of “road” can save time without compromising the analysis.
- Use the Front-End: Instead of making structural changes within SQL Server, rely on the presentation layer to make label changes. This approach is easier and more flexible.
- Don’t Be Afraid to Start Over: If the development process is not going as planned, don’t hesitate to trash it and start fresh. It’s better to invest time in building a solid foundation than to continue with a flawed approach.
- Don’t Assume Lowest Grain for Fact Table: It’s not always necessary to drill down to the lowest level of granularity in the fact table. Using aggregations and roll-ups can save time and still provide valuable insights.
- Load the Fact Table Vertically: Loading the fact table one row at a time can be slow. Instead, take advantage of recordset logic and minimally logged operations to load column by column. Consider using degenerate dimensions to improve loading performance.
By following these best practices, you can streamline the development process and create a modernized data warehouse that meets the needs of today’s data analysis requirements.
Creating a data warehouse involves several steps:
- Create a Database: Start by creating a database for your data warehouse.
- Import the Data: Import the data from your source(s) into the database.
- Create Dimension Tables: Create dimension tables, such as a date dimension or location dimension, using appropriate schema scripts.
- Create the Fact Table: Create the fact table that will store the main data from your system.
- Set Foreign Key Relationships: Establish the necessary relationships between the dimension tables and the fact table.
Throughout the process, it’s important to keep the development timeline short and the costs low. By challenging traditional rules and adopting a modern mindset, you can create a data warehouse that is efficient, cost-effective, and capable of providing valuable insights.
Modernizing the traditional data warehouse is an ongoing process. As technology continues to evolve, we need to adapt and find new ways to improve the development and maintenance of data warehouses. The Kimball DW methodology still holds its value, but we must continue to innovate and explore new approaches to extend its lifespan.
About the Author: Lee Everest has been a SQL Server developer since 1998 and specializes in data mart and DW development using Kimball methodologies. He is passionate about finding efficient and cost-effective solutions for data analysis.