Published on

December 16, 2020

Understanding Automatic Failover in SQL Server

In a previous tip, we discussed how to set up AlwaysOn in SQL Server 2012. After successfully setting up AlwaysOn and testing the failover to the secondary replica, you may encounter a situation where automatic failover works only the first time. This behavior is expected and can be easily explained.

One important setting to check in this scenario is the “Maximum Failures in the specified period” setting, which can be found in the Application properties of your failover cluster settings. To access this setting, follow these steps:

  1. Open the Server Manager console.
  2. Go to Features > Failover Cluster Manager > Your Cluster > Services and Applications > Your Application.
  3. Right-click and select “Properties.”
  4. In the property window, click on the “Failover” tab.

By default, the failover property settings for the resource group are as follows:

  • Maximum Failures in the specified period is set to (n-1), where n is the number of nodes participating in your cluster.
  • Period (measured in hours) is set to 6.

With these default values, automatic failover will only work the first time after your setup. If a failover occurs and the resources do not come online, the cluster services will only attempt to bring the resources online once within a 6-hour period.

During testing, you can adjust these values to suit your needs. However, when going live or rolling out to production, it is crucial to ensure that these values align with your organization’s documented policies for Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

Additionally, you can configure the failback settings based on the preferred owner for your cluster. In the General tab, you can select the preferred owner replicas and choose the “Allow failback with a specified time” option.

These settings, also known as the failover threshold, determine the limit of how many failovers can occur within a specified time period. It is important to prevent frequent failovers that can lead to a ping pong scenario. If you experience frequent failovers, it is recommended to investigate the underlying cause, such as hardware issues, to ensure the stability of your cluster.

To further analyze the failover events, you can view the cluster logs. Follow these steps:

  1. Open the Failover Cluster Manager.
  2. Go to Your Cluster > Cluster Events.
  3. Right-click and select “Query.”
  4. Enter the query to filter the failover events and click “OK.”

In summary, the behavior you observed during your initial failover test is due to reaching the threshold setting. This setting is in place to prevent frequent failovers and ensure the stability of your cluster. If you encounter excessive failovers, it is important to investigate the underlying cause to address any potential issues.

Remember, it is always beneficial to thoroughly test your failover scenarios before going live to ensure the reliability and resilience of your SQL Server environment.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.