As a SQL Server consultant, I often come across various issues related to SQL Server Agent. One common problem that I have encountered is the SQL Agent not starting in a clustered environment. In this blog post, I will discuss a specific situation where SQL Agent was unable to start and provide a solution to resolve the issue.
When attempting to start the SQL Agent from the SERVICES.msc, it would run for a few seconds and then automatically stop. To investigate the problem, I requested the SQL Agent Log, which is located in the same directory as the SQL Server ERRORLOG.
Upon examining the SQLAgent.OUT file, I found the following content:
2016-05-07 06:44:03 – ? [100] Microsoft SQLServerAgent version 11.0.2100.60 (X64 unicode retail build) : Process ID 9252 2016-05-07 06:44:03 – ? [495] The SQL Server Agent startup service account is Super\SVC. 2016-05-07 06:44:34 – ! [150] SQL Server does not accept the connection (error: 53). Waiting for Sql Server to allow connections. Operation attempted was: Verify Connection On Start. 2016-05-07 06:44:34 – ! [000] Unable to connect to server ‘(local)’; SQLServerAgent cannot start 2016-05-07 06:44:40 – ! [298] SQLServer Error: 53, Named Pipes Provider: Could not open a connection to SQL Server [53]. [SQLSTATE 08001] 2016-05-07 06:44:40 – ! [165] ODBC Error: 0, Login timeout expired [SQLSTATE HYT00] 2016-05-07 06:44:40 – ! [298] SQLServer Error: 53, A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. [SQLSTATE 08001] 2016-05-07 06:44:40 – ! [382] Logon to server ‘(local)’ failed (DisableAgentXPs) 2016-05-07 06:44:40 – ? [098] SQLServerAgent terminated (normally)
Based on the above log, it is clear that the SQL Server Agent is unable to connect to the SQL Server with the error message “A network-related or instance-specific error has occurred while establishing a connection to SQL Server. The server is not found or not accessible. Check if the instance name is correct and if SQL Server is configured to allow remote connections.”
This is a generic error that can occur due to various reasons. To resolve this issue, I followed the steps outlined in my blog post titled “FIX: ERROR: (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server)”. However, in this particular case, the issue was not resolved by starting the SQL Server Browser service on both nodes.
After further investigation, I discovered that the problem was related to a name resolution issue with the DNS. To fix the issue, I added a TCP alias on both nodes, specifying the SQL Server Virtual Server Name, SQL listening port, protocol (TCP/IP), and the IP address for the SQL Clustered instance.
If you have encountered a similar issue in a clustered environment where a resource is not coming online, I would love to hear about your experience and how you resolved it. Please share your thoughts and insights in the comments section below.