Published on

December 9, 2008

Exploring Data Compression in SQL Server

One of the key advantages of migrating to SQL Server 2008 is the ability to compress your data, reducing disk overhead. Data compression is a way of compressing the data within your database to greatly reduce the amount of storage space required. This can lead to significant performance improvements, especially in data warehouse and data mart environments where large datasets are involved.

There are two types of data compression in SQL Server: row compression and page compression. Row compression eliminates blank characters within fixed character strings and does not store null or zero values, resulting in storage savings. Page compression uses a more complex algorithm known as dictionary compression, which builds a dictionary based on the data stored on a page and stores only the dictionary id and changes of the dictionary value. Page compression includes row compression, providing even greater storage savings for similar patterned data.

However, data compression is not suitable for every scenario. Depending on the workload of your system, the performance requirements, and whether or not you use encryption, data compression may not be the right solution for you. There is a CPU overhead associated with using data compression, which can impact system performance, especially in high volume OLTP systems.

So how do you determine if data compression is right for your database? Here are some steps you can follow:

  1. Estimate the potential storage savings that you could achieve by implementing data compression. It’s important to note that in certain circumstances, you may end up using more storage.
  2. Perform a baseline performance analysis of your database server and reproduce it in a development or staging environment.
  3. Enable compression and evaluate the performance against the baseline.
  4. Identify tables that can provide the biggest benefit. Tables with repetitive or numerical data, as well as CHAR columns that are not fully populated, are usually excellent candidates for compression.
  5. Check and recheck your analysis against the baseline, and seek feedback from users if you decide to implement compression in a production environment.

To estimate the potential savings for a specific table, SQL Server 2008 provides a stored procedure called sp_estimate_data_compression_savings. This procedure accepts parameters such as the schema name, object name, index id, partition number, and data compression type. By passing the appropriate values, you can estimate the savings for row compression, page compression, or no compression.

If you want to estimate the savings for all tables within a single database or across all databases, you can use the USP_Compression_Savings_By_DB procedure. This procedure retrieves the estimated compression savings for all tables and stores the information in a table for later analysis.

It’s important to note that the sp_estimate_data_compression_savings procedure can be resource-intensive, especially when checking all tables in a database or instance. It is recommended to run it against a recent dump of your production databases restored onto another system or during a period of least activity on your system.

Data compression can be a powerful tool that provides significant benefits in terms of storage and performance. However, it is crucial to perform thorough analysis and testing to ensure that enabling data compression does not negatively impact your production systems.

For more information and the code for the USP_Compression_Savings_By_DB procedure, please refer to the resources section below.

Resources:

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.