As a developer and database professional, I have encountered the classic problem of balancing OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) needs in SQL Server. Many companies face the challenge of using the same database for both transactional processing and reporting, which often leads to performance degradation. To address this issue, a common solution is to create a separate “reporting” database dedicated to supporting reporting needs. However, implementing this solution is not as simple as it may seem.
In my recent reporting project, we had several working constraints:
- No performance impact on the current production OLTP transactions
- Maximum of 15-minute lag time between the reporting system and the transactional system
- Dramatic improvement of report performance
- Cannot use SQL Server Reporting Services
- OLTP system must be purged on a regular basis
- Reporting database must not be purged
These seemingly straightforward requirements added complexity to the project. We explored various ideas and solutions to meet the business needs, and our final solution involved the following architecture:
Copying OLTP Data to Reporting Server
The first step was to migrate data from the OLTP system to the reporting database. To avoid impacting the performance of the OLTP system, we decided to create a staging database on a separate physical server. This staging database would serve as the source for our ETL (Extract, Transform, Load) processes.
Next, we needed to regularly copy the staged data to the reporting server. After considering different options such as restored backups, log shipping, and replication, we chose SQL Server 2005 mirroring. Mirroring provided low impact on the OLTP system, easy setup, and maintenance.
Mirrored Databases – Unavailable
One challenge we encountered with mirrored databases was that the target database in a mirrored configuration is not available for use, even for read-only purposes. However, we discovered the Database Snapshot feature in SQL Server 2005, which allowed us to create a snapshot of the mirrored database and use it as the source for our ETL jobs. Creating a snapshot is a fast operation, even for sizable databases.
To create a snapshot of an existing database, you can use the following T-SQL statement:
CREATE DATABASE MySnapshot ON ( NAME = Snapshot_Data, FILENAME = 'H:\Data\Snapshot\MySnapshot_data.ss' ) AS SNAPSHOT OF MySourceDB
ETL and Contention Issues
Another concern was the complexity of the business logic in our ETL jobs. The logic was CPU and IO intensive, and we were worried about contention issues during the loading and updating of the reporting database. To address this, we created two separate reporting databases on the reporting server: “ReportingDB1” and “ReportingDB2”. By separating the databases, we alleviated blocking issues between the ETL jobs and report users. Although we couldn’t afford separate physical servers, we placed the two databases on separate LUNs on the SAN to reduce IO contention.
The ETL loaders were designed to keep track of the status of the two databases. One database was always “live” while the other was always “loading”. The databases were updated in a round-robin fashion every 15 minutes. The reporting UI also monitored these status values to determine which database to use when generating reports. In the future, if budget allows, we can easily move one or both of the databases to their own dedicated servers.
Record Deltas
One challenge we faced was determining which data needed to be migrated from the OLTP system to the reporting system with each ETL iteration. To address this, we added auditing columns (CreateDate and LastEditDate) to every table in the OLTP system. These columns allowed us to track changes and compare them to the last known run date of the ETL jobs. This approach ensured that only the necessary data was migrated, minimizing the impact on the system.
ETL Load and IO
Working with ETL tools like SSIS (SQL Server Integration Services) can be challenging, especially for those new to the technology. It is important to invest time in learning and mastering these tools to avoid potential pitfalls. In our project, we encountered some issues that were resolved with SQL Server 2005 Service Pack 2.
IO resources play a crucial role in the completion of data-intensive ETL jobs. While our architecture is holding up in production, our IO subsystem (an EMC Symmetrix SAN) is under significant load. It is essential to plan and optimize the IO subsystem to ensure smooth operation.
Summary
Creating a reporting system to alleviate load in your OLTP environment is a common challenge for DBAs and data architects. There is no one-size-fits-all solution, and it requires careful planning and implementation. The architecture described in this article provides one possible solution to guide you in implementing your own custom solution. I welcome feedback and suggestions for improvement, as there is always room for refinement and future iterations of reporting systems.