Published on

December 3, 2016

Fixing SQL Server Cluster Patching Issues

During a recent consulting engagement, I encountered a common issue faced by many SQL Server administrators after applying a service pack on a clustered environment. The SQL Server instance was not coming online, and the client sought my assistance in resolving the problem.

My first step was to ask the client to share the ERRORLOG from the SQL instance. The ERRORLOG is a valuable resource for troubleshooting SQL Server issues. It provides detailed information about errors and events that occur during the operation of the SQL Server instance.

After analyzing the ERRORLOG, I found the following error message:

Error: 5184, Severity: 16, State: 2.
Cannot use file 'D:\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\temp_MS_AgentSigningCertificate_database.mdf' for clustered server. Only formatted files on which the cluster resource of the server has a dependency can be used. Either the disk resource containing the file is not present in the cluster group or the cluster resource of the SQL Server does not have a dependency on it.

This error indicated that the D drive, which contained a necessary file for the SQL Server instance, did not have a dependency with the SQL Server resource in the cluster. To resolve this issue, I checked the failover cluster manager and discovered that the D drive was not listed as a cluster disk.

To add the D drive as a cluster disk, I followed these steps:

  1. Opened the failover cluster manager
  2. Clicked on the highlighted area to add a new disk

However, even after adding the disk, the issue was not resolved, and the SQL Server instance still did not come online. Upon further investigation of the ERRORLOG, I found another problem:

Error: 18456, Severity: 14, State: 11.
Login failed for user 'NT AUTHORITY\SYSTEM'. Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors. [CLIENT: 100.168.11.171]

It turned out that the ‘NT AUTHORITY\SYSTEM’ login did not exist in the SQL Server instance because the system databases had been rebuilt. To fix this issue, I performed the following steps:

  1. Started SQL Server using the command prompt: NET START MSSQLSERVER /m
  2. Added the ‘NT AUTHORITY\SYSTEM’ account
  3. Stopped SQL Server: NET STOP MSSQLSERVER

After completing these steps, the SQL Server instance came online successfully, and the issue was resolved.

It is not uncommon to encounter similar issues when rebuilding system databases in a clustered environment. Therefore, it is important to be aware of the potential challenges and follow the appropriate steps to resolve them.

By understanding the error messages in the ERRORLOG and taking the necessary corrective actions, SQL Server administrators can ensure a smooth recovery and minimize downtime during cluster patching.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.