When working with databases, it is common to use the same column name across different tables to denote the same item of data. However, it is important to ensure that these columns have the same datatype to avoid potential issues and performance overhead.
In SQL Server, columns with the same name in different tables are typically used to link tables together as key/foreign keys. If the datatypes of these columns are not the same, the SQL Server optimizer has to do additional work to make the columns comparable. This can result in unnecessary overhead and decreased performance.
Furthermore, having different datatypes for the same column can also have implications for data integrity. It can potentially invalidate the underlying table and any client applications that rely on the column. For example, if a column should have been defined as an integer (int) but has been defined as a tinyint, a large value will cause an overflow error.
To address this issue, a utility can be used to identify columns that have mismatched datatypes across tables in a database. This utility compares the datatypes, maximum character length, and numeric precision and scale attributes of columns with the same name. It then sorts the results by the prevalence of the column name within all the column names.
By using this utility, you can quickly identify which columns have mismatched datatypes and take appropriate actions to correct them. This will not only improve performance but also ensure data integrity and maintainability.
It is important to note that the utility described in this article is database-specific. Therefore, it needs to be run inside each database you are interested in.
Here is an example of the code used in the utility:
-- Calculate prevalence of column name
SELECT
COLUMN_NAME,
[%] = CONVERT(DECIMAL(12,2),COUNT(COLUMN_NAME)* 100.0 / COUNT(*)OVER())
INTO #Prevalence
FROM INFORMATION_SCHEMA.COLUMNS
GROUP BY COLUMN_NAME
-- Do the columns differ on datatype across the schemas and tables?
SELECT DISTINCT
C1.COLUMN_NAME,
C1.TABLE_SCHEMA,
C1.TABLE_NAME,
C1.DATA_TYPE,
C1.CHARACTER_MAXIMUM_LENGTH,
C1.NUMERIC_PRECISION,
C1.NUMERIC_SCALE,
[%]
FROM INFORMATION_SCHEMA.COLUMNS C1
INNER JOIN INFORMATION_SCHEMA.COLUMNS C2 ON C1.COLUMN_NAME = C2.COLUMN_NAME
INNER JOIN #Prevalence p ON p.COLUMN_NAME = C1.COLUMN_NAME
WHERE ((C1.DATA_TYPE != C2.DATA_TYPE)
OR (C1.CHARACTER_MAXIMUM_LENGTH != C2.CHARACTER_MAXIMUM_LENGTH)
OR (C1.NUMERIC_PRECISION != C2.NUMERIC_PRECISION)
OR (C1.NUMERIC_SCALE != C2.NUMERIC_SCALE))
ORDER BY [%] DESC, C1.COLUMN_NAME, C1.TABLE_SCHEMA, C1.TABLE_NAME
By running this code, you will get a list of columns that have mismatched datatypes across tables in the database.
It is worth noting that some columns may be identified as having mismatched datatypes, but they may actually describe different things. In such cases, it is recommended to give them different names to avoid confusion.
Additionally, some of the identified mismatched columns may be due to outdated views. In such cases, using the sp_refreshview
stored procedure can help correct the issue.
Correcting the identified mismatched columns may involve updating the column datatypes and potentially modifying any client applications that use these columns. It is important to ensure that the correct datatype is used for each column and to update any affected client applications accordingly.
On a pre-emptive note, this utility can also be used as part of a quality assurance process to ensure that columns in new or amended tables are defined consistently across different tables before being migrated to a production environment.
In conclusion, the utility described in this article provides a valuable tool for SQL Server DBAs and developers to quickly identify columns with mismatched datatypes across tables in a database. By recognizing and correcting these columns, you can improve performance, data integrity, and maintainability in your SQL Server environment.
Author: Ian Stirk