System Center Operations Manager (SCOM) 2007 is a powerful enterprise tool designed for monitoring the health state of Windows applications in a network domain. One of the applications that SCOM can monitor is SQL Server. However, the out-of-the-box functionality for monitoring SQL Server blocking is minimal, requiring a custom solution.
When working with SCOM customization, database professionals face several challenges. First, finding online database-specific information for SCOM is not easy. Techniques applicable to database problems are usually communicated in a non-database context, making it difficult to translate to a SQL Server scenario. Second, the Microsoft documentation for SCOM is limited and does not cover the fundamentals thoroughly, especially when it comes to SQL Server. Third, SCOM itself has inconsistencies and bugs, making it a steep learning curve for anyone trying to customize it.
Despite these challenges, the benefits of SCOM customization for monitoring SQL Server blocking far outweigh the effort required to make it work. In this article, we will explore a custom solution for monitoring blocking occurrences in a SQL Server database using SCOM.
Initial Attempts
Our requirement was to expose blocking occurrences with wait times greater than 10 minutes as alerts on the SCOM console, while also triggering email notifications to our DBA group. The default blocking solution provided by SCOM was not sufficient, so we needed to create custom rules.
However, we encountered a challenge when trying to override the monitor parameters to generate custom alert descriptions. Due to a bug in the current version of SCOM, we were unable to create a diagnostic task that would run a custom script to query the SQL instance and send email notifications. After seeking support from Microsoft and reading articles by other experts, we found a solution.
Implementation
The strategy we used was to create two custom rules. The first rule would query the database server for blocking conditions at set intervals and report any occurrences with a custom description in the Application Event Log. The second rule would scan the Application Log periodically and generate a critical alert with the description taken from the event. Our subscription would then pick up the alert and send us an email notification.
To cover both SQL Server 2000 and SQL Server 2005 instances, we created two rules, one for each version. The first rule used a vbs script to query the SQL instance and write an entry to the Windows Application Log if a blocking condition lasting over 10 minutes was detected. The second rule scanned the Application Log for events with Event ID 999 and Event Source SQLBlocks, generating alerts with the event description.
Testing
Before deploying the custom rules to our production environment, we tested them on our test servers. This allowed us to verify that the new functionality worked correctly without generating false alerts. Once we were confident in the results, we exported the custom rules as management packs and deployed them to our production environment.
To simulate blocking conditions for testing, we used a script that induced blocking in our test servers. By running the script in two separate sessions, we were able to simulate blocking scenarios with different numbers of blocking and blocked spids.
Limitations
While implementing the custom solution, we encountered some limitations. One limitation was the length of the event description string, which caused access-denied errors when the number of spids exceeded a certain threshold. To address this, we kept the email notifications short and concise.
Another limitation was the inability to record the SQL text of blocking and blocked spids in a centralized analysis repository for SQL Server 2000 instances. The required sysadmin access on the SQL instance was not transferrable to the Local System account, which is used by the OpsMgr Health Service. However, this was not an issue for SQL Server 2005 instances, as the Dynamic Management Views used for retrieving the SQL text were readily accessible.
Results
With the custom rules in place, we were able to receive email notifications whenever a blocking alert was generated in the SCOM console. The email notifications contained basic information about the blocking occurrences, such as the blocking SPID, database, host, program, and wait time.
Overall, the custom solution for monitoring SQL Server blocking using SCOM proved to be scalable and maintainable. Whether you have 10, 100, or 1000 SQL instances in your environment, the implementation is the same. Once the basic functionality is in place, SCOM automatically runs it on every managed server in the enterprise, making it a powerful monitoring tool for database professionals.
In conclusion, by leveraging the power of System Center Operations Manager, database professionals can effectively monitor and manage SQL Server blocking in their environment. With the custom solution described in this article, you can gain valuable insights into blocking occurrences and take proactive measures to optimize your SQL Server performance.