In this blog post, we will delve deeper into the concept of wait stats in SQL Server. Wait stats occur when SQL Server has to wait for resources to execute a task. These waits are recorded by SQL Server along with the reason for the delay. By analyzing these wait stats, we can gain insights into the reasons for task delays and potentially eliminate them to improve SQL Server performance.
Before we dive into the details of wait types and wait stats, it’s important to understand three key milestones in the query life-cycle:
- Running: A query that is currently being executed on a CPU. This query is responsible for CPU time.
- Runnable: A query that is ready to execute but waiting for its turn to run. This query is responsible for Signal Wait time. In other words, the query is ready to run, but the CPU is currently servicing another query.
- Suspended: A query that is waiting due to any reason (which we will explore through wait stats) to be converted to a runnable state. This query is responsible for wait time. In other words, this is the time we aim to reduce.
In simple terms, the query execution time is the sum of the query’s executing CPU time (running), query wait time (suspended), and query signal wait time (runnable). It’s important to note that a query can go through these stats multiple times.
Let’s try to understand this concept with a simple analogy of a taxi and a passenger. Imagine two friends, Tom and Danny, going to the mall together. When they leave the mall, they decide to take a taxi. Tom and Danny both stand in line, waiting for their turn to get into the taxi. This waiting time represents the Signal Wait Time, as they are ready to get into the taxi but have to wait for their turn while the taxis are serving other customers. In other words, they are in a runnable state.
Now, when it’s their turn to get into the taxi, the taxi driver informs them that he only accepts cash and doesn’t take credit cards. Unfortunately, neither Tom nor Danny have enough cash, so they both cannot get into the vehicle. Tom waits outside in the queue, while Danny goes to an ATM to withdraw cash. During this time, the taxi cannot wait and has to let other passengers get into the taxi. As Tom and Danny both wait outside in the queue, this represents the Query Wait Time, and they are in a suspended state. They cannot do anything until they get the cash.
Once Danny gets the cash, they both join the line again, creating another Signal Wait Time. This time, when their turn comes, they can pay the taxi driver in cash and reach their destination. The time taken for the taxi to travel from the mall to the destination represents the running time (CPU time), and the taxi is running.
I hope this analogy helps clarify the concept of wait stats. You can check the Signal Wait stats using the following query by Glenn Berry:
-- Signal Waits for instance
SELECT
CAST(100.0 * SUM(signal_wait_time_ms) / SUM(wait_time_ms) AS NUMERIC(20, 2)) AS [%signal (cpu) waits],
CAST(100.0 * SUM(wait_time_ms - signal_wait_time_ms) / SUM(wait_time_ms) AS NUMERIC(20, 2)) AS [%resource waits]
FROM
sys.dm_os_wait_stats
OPTION (RECOMPILE);
A higher value for Signal Wait stats is not good for the system and indicates CPU pressure. In my experience, when systems are running smoothly without any glitches, the Signal Wait stat is lower than 20%. However, this number can vary and is not documented anywhere. In general, lower values are better, while higher values are not good for the system.
In future articles, we will discuss in detail the various wait types and wait stats, as well as their resolutions. Stay tuned for more posts in the Wait Types and Queue series.