Published on

January 18, 2017

Monitoring Failed Jobs in SQL Server

As a database administrator (DBA), it is crucial to be aware of any failed jobs in your SQL Server environment. Not only do you need to know when a job fails, but you also need to remediate the issue and provide an auditable trail of your actions. In this article, we will discuss a solution for monitoring and managing failed jobs in SQL Server.

Approaches to Monitoring Failed Jobs

There are several approaches you can take to monitor failed jobs in SQL Server. One option is to install mail on the database server and let the system notify you of failures. While this is a viable solution, it may not be suitable for all versions of SQL Server or for environments with multiple servers.

Another option is to implement Reporting Services (RS) to generate reports on job failures. However, this may not be ideal for environments with a large number of servers to monitor.

There are also third-party solutions available on the market that can help you organize your jobs and notify you of failures. These solutions can be effective, but they may come with additional costs.

In this article, we will explore a custom solution for monitoring failed jobs in SQL Server.

Custom Solution Overview

The custom solution involves creating a central Monitor Server with SQL Server installed on it. For each server you wish to monitor, a linked server will be created and tested. The Monitor Server will have a job that runs periodically and retrieves data about the failed jobs from the linked servers. This data will be stored locally on the Monitor Server.

An email will be sent to all interested parties, including the DBA, detailing each failed job. The DBA will then fill in the remediation data associated with each failure. This information will be stored in a history table, providing an auditable trail of actions taken to remediate the failure.

Reports can be generated for auditors or other interested parties, detailing job failures and their remediation.

Components of the System

The custom solution involves several components:

  • Monitor Server: This is the central server where SQL Server is installed. It is responsible for retrieving data about failed jobs from the linked servers.
  • Linked Servers: These are created for each server you wish to monitor. They allow the Monitor Server to connect to the monitored servers and retrieve job failure data.
  • Staging Table: This table stores the data about failed jobs retrieved from the linked servers.
  • History Table: Once a job failure has been reported, the data is moved to the history table. Additional fields are added to the history table for remediation purposes.
  • Configuration Tables: These tables store information about the servers to be monitored, including email notification settings and grouping of servers.
  • Stored Procedures: These procedures handle the insertion of data into the history table, sending email notifications, and processing failed jobs.
  • System Jobs: These jobs are responsible for executing the stored procedures and managing the overall process of monitoring failed jobs.

Benefits of the Custom Solution

The custom solution provides several benefits:

  • Timely Notifications: The system sends email notifications for each failed job, ensuring that the DBA is promptly informed of any issues.
  • Auditable Trail: The history table allows the DBA to document the steps taken to remediate each failure, providing an auditable trail of actions.
  • Flexible Configuration: The configuration tables allow for easy management of servers to be monitored and email notification settings.
  • Scalability: The solution can be scaled to monitor multiple servers, making it suitable for environments with a large number of SQL Server instances.

Conclusion

Monitoring failed jobs in SQL Server is a critical task for DBAs. While there are various approaches and solutions available, a custom solution can provide the flexibility and control needed to effectively manage job failures. By implementing a central Monitor Server, linked servers, and a system of storing and reporting on job failures, DBAs can ensure timely remediation and maintain an auditable trail of their actions.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.