Data Manipulation Language (DML) transactions are an essential part of any workload that runs on SQL Server. Having a solid understanding of how DML statements work under the hood is crucial for writing better-performing code. In this article, we will delve into the internals of the three main DML statements – insert, update, and delete – when used against heaps.
A heap is a table without a clustered index, meaning it does not follow a logical ordering of rows. SQL Server uses Index Allocation Map (IAM) pages to track and scan the data pages allocated by the table. However, this lack of ordering can result in poor performance due to full scans when searching for records that match the query’s filter criteria.
When inserting a row into a heap, SQL Server looks for available space and inserts the row wherever it can fit. It uses the IAM pages to find the extents belonging to the table and the PFS pages to track free space on the data pages. If there is no space on existing pages, SQL Server tries to find unallocated pages within the same extent. If none are available, it allocates a new extent.
Deleting a row from a heap does not automatically release the space or move data on the page. SQL Server simply sets the row offset for the deleted row to 0, indicating that the space can be used for a new row. Empty pages are not deallocated unless the heap table is rebuilt or a clustered index is created. However, the space held by the deleted rows can still be reused for future inserts.
Updating rows in a heap can occur in place or not in place. In-place updates happen when the row’s value changes without moving it to a different page. If the row moves to another page or a different location on the same page, it is considered an update not in place and is performed as a delete followed by an insert. Not-in-place updates in heaps result in forwarding pointers, which are 16-byte rows that point to the new location of the forwarded row.
Forwarding pointers introduce multiple physical reads during a single RID (Row ID) lookup, leading to poor performance. They mainly occur when a variable-length column is updated to hold values larger than can fit in the same location. To fix forwarding pointers, it is recommended to create a clustered index on the table or rebuild the heap table.
While heaps may offer better performance for quickly inserting large amounts of data, the benefits of having a clustered index generally outweigh those of heaps. Key lookups against a clustered index have minimal performance impact compared to the physical I/O introduced by RID lookups against heaps.
In conclusion, understanding the internals of DML transactions in SQL Server, particularly when working with heaps, is crucial for optimizing performance. By knowing how inserts, updates, and deletes work against heaps, you can make informed decisions on when to use heaps and when to consider using a clustered index. In the next article, we will explore the impact of DML operations on tables with B-tree indexes, page splits, and fragmentation.