When designing a database table in SQL Server, one of the important decisions to make is selecting the primary key. The primary key is a column or a combination of columns that uniquely identify a record in the table. However, before choosing the primary key, it is essential to understand the concept of candidate keys.
What are Candidate Keys?
A candidate key is any column or a combination of columns that can qualify as a unique key in the database. In other words, it is a potential primary key. A table can have multiple candidate keys, and each candidate key can qualify as a primary key.
Let’s consider an example to understand candidate keys better. Suppose we have a table with three columns: ProductID, Name, and ProductNumber. Each of these columns can be a single column candidate key. Additionally, combining more than one column can create multiple possible candidate keys.
It is important to note that only one column can be selected as the primary key. The decision of selecting the primary key from the possible combinations of candidate keys can be perplexing but imperative.
Verifying Candidate Keys
To verify if the candidate keys are unique, we can run a script in SQL Server. The script will return the same number of rows for all the options, indicating that they are all unique in the database and meet the criteria of a primary key.
USE AdventureWorks
GO
SELECT * FROM Production.Product
GO
SELECT DISTINCT ProductID FROM Production.Product
GO
SELECT DISTINCT Name FROM Production.Product
GO
SELECT DISTINCT ProductNumber FROM Production.Product
All of the above queries will return the same number of records, confirming that they all qualify as candidate keys.
Selecting the Primary Key
When selecting the primary key, there are a few points to consider:
- Select a key that does not contain NULL: Even if a candidate key currently does not have any NULL values, it should not have the potential to contain NULL values in the future. A column that can potentially contain NULL values does not qualify as a primary key.
- Select a key that is unique and does not repeat: A candidate key that is unique at present may contain duplicate values in the future. Such candidate keys do not qualify as primary keys.
- Make sure the primary key does not keep changing: While not a hard and fast rule, it is generally recommended to select a primary key that does not frequently change. Changing primary key values can adversely affect database integrity, data statistics, and indexes.
Applying these rules to our example table, we can determine the appropriate candidate for the primary key. The Name column can potentially contain NULL values, so it does not qualify. The ProductNumber column can be duplicated for different manufacturers, so it also does not qualify. However, the ProductID column, which is an identity column, uniquely defines each row and will be used as a foreign key in other tables. Therefore, in this case, the ProductID column qualifies as the primary key.
It is worth mentioning that while many database experts suggest not using an identity column as the primary key, our example presents a different scenario. In this case, the ProductID column is not only unique but also plays a role in joins and is used as a foreign key in other tables.
Foreign Keys and Self-Referencing Foreign Keys
A foreign key in a database table refers to the primary key in another table. A primary key can be referenced by multiple foreign keys from other tables. It is not necessary for a primary key to be the reference of any foreign keys. Additionally, a foreign key can refer back to the same table but to a different column. This type of foreign key is known as a “self-referencing foreign key.”
Summary
In summary, a table can have multiple candidate keys that are unique as single columns or combinations of multiple columns. These candidate keys are all potential primary keys. The best candidates for the primary key are those that follow three rules: they are not null, have unique values in the table, and remain static. If there are multiple candidate keys that satisfy the criteria for a primary key, the decision should be made by experienced database administrators who consider performance implications.
Choosing the right primary key is crucial for maintaining data integrity and optimizing database performance. By understanding candidate keys and their characteristics, you can make informed decisions when designing your SQL Server database tables.