One of the worst situations a database professional can face is receiving a call reporting a production database in a “Suspect” state. This is a critical situation where the database needs to be brought online as soon as possible. In this article, we will discuss a high-level process to handle this situation.
Step 1: Inform the Customer
The first step is to proactively inform the customer about the outage before they come back to you. This step is crucial as it sets the expectation and avoids unnecessary questions about the time taken to bring the database online.
Step 2: Identify the Root Cause
Refer to the SQL Server error log to find the root cause of the issue. Starting from SQL Server 2005, it is recommended to use a filter while viewing the log and filter by “Database name” to see only the logs related to the particular database. For earlier versions of SQL Server, you will need to read the log from the latest entry backwards to find the root cause.
Step 3: Fix the Issue
Once you have identified the reason why the database is in suspect mode, you need to take appropriate steps to fix the issue. Here are some possible issues and their recommended resolutions:
Possibility 1: Missing Data/Log File
If someone deleted or misplaced a data/log file while SQL Server was offline, the database cannot start due to the missing file. The solution is to place the missing data/log file in the proper location. The SQL Server Error Log will provide the exact name and path of the missing file. Once you have placed the file, execute the following command to bring the database online with no data loss:
RESTORE DATABASE [DatabaseName] WITH RECOVERY
Possibility 2: Exclusive Lock on Data/Log File
If SQL Server cannot access or place an exclusive lock on the data or log file while coming online, it may be due to another tool (like an antivirus) placing an exclusive lock on the file. To resolve this, use process explorer to identify and kill the file handler that placed the lock. You may need to involve your system administrators for this step. Once the lock is removed, execute the following command to bring the database online with no data loss:
RESTORE DATABASE [DatabaseName] WITH RECOVERY
Possibility 3: Corrupted Transaction
If the database is in suspect mode due to a corrupted transaction, it is a worst-case scenario. In this case, you may have to lose data unless you have a good backup. This is a common case for putting an OLTP database in suspect mode. The root cause is often SQL Server abruptly going down or restarting in the middle of a transaction. If you have a good backup and can restore the database up to an acceptable point, it is recommended to do so. However, if restore is not an option, you can try the following steps:
Caution! The following steps will cause data loss and should only be executed as a last resort. It is recommended to try all other possible options, including contacting Microsoft Support, before executing these steps.
- Switch the Emergency mode on for the database using the following command:
ALTER DATABASE [DatabaseName] SET EMERGENCY;
- Execute the following command:
DBCC CHECKDB ([DatabaseName], repair_allow_data_loss);
Please note that executing the above command is extremely dangerous and can cause data loss or database integrity issues. This command authorizes SQL Server to force transactional recovery, skipping errors and scavenging as much as possible from the transaction log. After this operation is complete, the database will be back online, but you may have lost data, broken constraints, and business logic. It is recommended to involve your customers to run a sanity check on the data quality at this point.
Possibility 4: Corrupted Data File
If you find that the data file is corrupted, it is likely due to an OS or hardware-level failure. In this case, the best option is to restore from a backup without wasting time in a “Code Red” situation.
Conclusion
In this article, we have covered some possible reasons for a database to be in a “Suspect” state and the options to bring it back online as quickly as possible. It is essential to have a good backup and disaster recovery strategy in place and ensure that the customer understands the potential risks of data loss and downtime in case of a disaster. Remember to never attempt to “detach” a database in suspect mode, as it can further complicate the recovery process. Always read and understand the error log before taking any action and remain calm and focused. With the right approach and intelligence, any situation can be resolved.
If you have any recommendations, suggestions, or have experienced a situation where this article did not help you resolve the issue, please feel free to reach out to us at info@consultdba.com.