Replication in SQL Server refers to the process of copying data from one database to another. It is a complex but essential feature that allows for data synchronization and distribution across multiple servers.
There are three main forms of replication: snapshot replication, merge replication, and transactional replication. Snapshot replication involves propagating the entire database each time, similar to the newspaper model where the whole newspaper is distributed. Merge replication incorporates changes at both the publisher and the subscriber. Transactional replication is an incremental flow of data from the publisher to the subscriber, similar to a streaming news service.
While replication is a powerful tool, it can also be prone to failure. This article will address two common causes of replication failure: when information in subscribers is not synchronized with information in the publication base tables. It will also provide guidance on where to find information on replication-specific issues.
Solution Suitability of Replication As A Solution
Replication has various use cases. It can be used to ensure live data is transferred to other servers for accurate code tests by developers. It is also used in distributed architectures, where databases in read/write mode apply data to read-only subscribers for reporting purposes.
One common architectural example where replication could be used is in a scenario where SALES_MASTER takes on the transactional load and replicates selected publication data to different databases located in different countries. For example, the REPORTS_JP database takes pub_2, which may contain articles dealing exclusively with sales in Japan.
There are two types of replication subscriptions: push subscriptions and pull subscriptions. Push subscriptions are initiated from the publisher, while pull subscriptions are initiated by the subscribers periodically querying the distribution database for unapplied transactions.
Latency, Tracer Tokens, and Replication Monitor
One common issue with replication is latency, which refers to the delay between the publisher and subscriber. Latency can be caused by various factors such as geographical distance, network traffic, transactional load, and system resources. It is important to monitor and manage latency to ensure replication is functioning properly.
Microsoft provides a tool called Replication Monitor to monitor latency. However, it is important to note that this tool is considered inaccurate by some DBAs and can occasionally cause resource conflicts. Alternatively, DBAs can measure latency themselves using T-SQL and tracer tokens.
Tracer tokens are inserted and measured at the publisher to determine the delay between different parts of the replication process. By automatically measuring these statistics, DBAs can create procedures to monitor latency and alert automatically when problems are detected.
Stalled Agents and Jobs, and Finding Further Diagnostic Information
Stalled agents and jobs are another common issue in replication. Stalled agents can cause increased latency and non-movement of data, leading to synchronization issues. It is important to identify the cause of stalled agents and take appropriate actions to resolve the issue.
DBAs can find further diagnostic information by examining the replication-specific tables on the distributor. Tables such as msrepl_errors, mspublications, mssubscriptions, msarticles, msrepl_transactions, and msdistribution_history provide valuable information on errors and replication processes.
Depending on the situation, DBAs may need to restart agents or reinitialize publications. Reinitialization involves rewriting the articles at the subscribers, overwriting the contents with the snapshot contents. This process can be resource-intensive and should be done during a quiet period.
Conclusion
SQL Server replication is a powerful feature that allows for data synchronization and distribution across multiple servers. However, it can also be prone to failure. Understanding the common causes of replication failure and knowing where to find diagnostic information is essential for effective replication management.
By monitoring latency, identifying stalled agents and jobs, and utilizing the available diagnostic tools, DBAs can ensure that replication is functioning properly and data is synchronized across all servers.