Published on

February 28, 2021

Understanding Lease Timeouts and Health Checks in SQL Server Always On Availability Groups

SQL Server Always On Availability Groups provide a resilient high availability and disaster recovery solution in a multi-node architecture. In this article, we will explore the concepts of lease timeouts and health checks in SQL Server Always On Availability Groups.

Lease Timeouts

In a synchronized availability group with automatic failover, lease timeouts play a crucial role in determining when a failover should occur. The lease timeout threshold is an important aspect of the flexible failover policy, which depends on the health-check timeout threshold, failure-condition level, and cluster timeouts.

The lease timeout period is set to a default value of 20 seconds (20000 milliseconds). If the lease timeout period elapses while waiting for a signal, the availability group resource goes into the resolving state, triggering a failover. The lease worker thread and resource host maintain a time-to-leave (TTL), which gets updated each time threads wait up after a signal. If the lease timeout occurs, the AG resource goes into the resolving state.

Health Checks

SQL Server Always On Availability Groups perform health checks of the primary replica using the sp_server_diagnostics stored procedure. This stored procedure executes every 10 seconds, and the health check timeout is set to 30 seconds (30000 milliseconds). If the stored procedure does not return any results or reports errors during the health check, the availability group refers to the previous state for determining the instance health until the health-check timeout threshold.

The health check timeout is an essential factor in determining when an automatic failover should be initiated. If the sp_server_diagnostics stored procedure does not return any data within the health check timeout period, the primary replica is considered unresponsive, and an automatic failover is initiated.

Monitoring and Logs

To monitor and investigate failures related to lease timeouts and health checks, various logs can be useful:

  • SQL Server error logs
  • Windows cluster logs
  • Cluster event logs
  • SQL Server failover diagnostics (sp_server_diagnostics) logs
  • AlwaysOn_Health extended events output
  • System_health extended events
  • Application and system logs

Reviewing these logs can help identify the scenarios where automatic failover occurred or failed.

Conclusion

In this article, we explored the concepts of lease timeouts and health checks in SQL Server Always On Availability Groups. Understanding these concepts is crucial for ensuring the resilience and high availability of your SQL Server environment. By configuring appropriate lease timeout and health check timeout values, you can optimize the failover process and minimize downtime.

Click to rate this post!
[Total: 0 Average: 0]
, , , , , , , , ,

Let's work together

Send us a message or book free introductory meeting with us using button below.