Published on

August 3, 2006

Hardware and Database Considerations for Data Warehousing

When setting up a data warehouse and selecting hardware infrastructure, there are several important factors to consider. In this article, we will explore some of these considerations and how they differ from those for transactional processing systems.

1. RAID Levels

For data warehouse databases, which are mostly static or read-only, RAID 10 is the ideal RAID level as it provides both striping and mirroring. If budget constraints are a concern, RAID 5 can be a suitable alternative.

2. Fact Table Placement

Fact tables in data warehouse databases are usually large tables. Consider placing them on separate physical files or even a separate set of disks. This can improve performance and allow for different RAID levels. Additionally, indexing strategy and table partitioning should be carefully considered.

3. Dimensional Schema

Data warehouse databases are typically arranged in dimensional schema formats such as star or snowflake schemas. This affects indexing strategy, maintenance, and referential integrity. Surrogate keys and composite primary keys are commonly used in dimensional databases.

4. Database Size and Growth

The size and growth rate of a data warehouse database impact disk layout, SAN infrastructure flexibility, memory and processor requirements, as well as backup infrastructure and restore procedures. Capacity planning is crucial in determining these requirements.

5. Staging Data

Consider where to stage data from the source system – in the file system or in a database. The choice may depend on the ETL software being used. Factors to consider include performance, resilience, network bandwidth, indexing, and referential integrity.

6. Non-Real Time Updates

Data warehouse databases are not real-time systems, unless they are active data warehouses. This affects hardware considerations such as resilience, high availability, RAID levels, clustering, replication, and backup strategies.

7. Large Query Output

Data warehouse queries are often more complex and can return a large number of records. This impacts hardware considerations, particularly in terms of disk striping requirements and the amount of memory needed for the database query engine.

8. Multidimensional Databases

For multidimensional databases, such as those used in SQL Server Analysis Services or Oracle OLAP, there are additional hardware considerations. These databases have specific optimization techniques and settings that should be taken into account.

By understanding these hardware and database considerations for data warehousing, you can make informed decisions when setting up your data warehouse infrastructure. Remember that data warehouse databases have unique characteristics and requirements that differ from transactional processing systems.

Thank you for reading!

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.