Problem:
In my work, I often need to replace sensitive information in SQL Server databases with random strings. This is especially useful when restoring a production database to a test environment or when generating dummy data for development purposes. I have specific requirements for the random string generation:
- I need to be able to define the length of the string within a range.
- I want to be able to generate the exact same strings repeatedly if needed, to ensure consistent data quantity and quality.
- I want to be able to generate random strings with simple patterns, such as the postal code format in Canada.
Solution:
There are several ways to generate random strings in T-SQL, such as using Rand(), NewID(), CRYPT_GEN_RANDOM(), and Convert/Cast/Substring functions. However, using pure T-SQL has some disadvantages:
- Non-deterministic functions like Rand(), NewID(), and CRYPT_GEN_RANDOM() cannot be used inside a UDF (User-Defined Function), requiring an additional layer to bypass this limitation.
- The performance of pure T-SQL solutions is compromised for heavy-load string generation.
To overcome these limitations, a CLR (Common Language Runtime) function can be used. CLR functions are better positioned for string manipulation inside SQL Server. Here, I will provide two CLR functions that meet the requirements mentioned above:
- fn_random_string: This function generates a random string with a specified length and seed parameter, ensuring repeatability with the same random string when using the same seed.
- fn_random_pattern: This function generates a random string with a simple pattern defined by the pattern parameter. It also has a seed parameter to ensure repeatable string generations.
Here is an example of how to create and use these CLR functions:
CREATE ASSEMBLY clr_random_string
FROM 'C:\mssqltips\Random_String\bin\CLR_Rand_String.dll'
WITH PERMISSION_SET = SAFE;
CREATE FUNCTION dbo.ucf_random_string (@minLen INT, @maxLen INT, @seed INT = 0)
RETURNS NVARCHAR(MAX) WITH EXECUTE AS CALLER
AS EXTERNAL NAME [clr_random_string].[UserDefinedFunctions].fn_random_string;
CREATE FUNCTION dbo.ucf_random_pattern (@pattern NVARCHAR(MAX), @seed INT = 0)
RETURNS NVARCHAR(MAX) WITH EXECUTE AS CALLER
AS EXTERNAL NAME [clr_random_string].[UserDefinedFunctions].fn_random_pattern;
-- Generate a single random string
SELECT RandStr = dbo.ucf_random_string(10, 30, DEFAULT),
RandPattern = dbo.ucf_random_pattern('What is the time, Mr. @@@@@@? It is ! am', DEFAULT);
-- Generate random Canada Post Code / US zip code
SELECT TOP 10 Canada_PostCode = UPPER(dbo.ucf_random_pattern('@!@ !@!', ROW_NUMBER() OVER (ORDER BY column_id))),
US_ZipCode = dbo.ucf_random_pattern('!!!!!-!!!!', CEILING(RAND(column_id)) + ROW_NUMBER() OVER (ORDER BY column_id))
FROM sys.all_columns;Performance Comparison between CLR and T-SQL:
It is important to compare the performance of CLR and T-SQL solutions. Here is an example of how to test the performance:
DECLARE @i INT = 1, @start_time DATETIME = GETDATE();
DECLARE @str VARCHAR(8000), @seed INT;
DECLARE @t_tsql TABLE (run_count INT, duration_ms INT); -- for T-SQL execution stats
DECLARE @t_clr TABLE (run_count INT, duration_ms INT); -- for CLR execution stats
-- Run T-SQL solution 20,000 times
WHILE @i <= 20000
BEGIN
SET @seed = @i;
EXEC dbo.usp_generateIdentifier
@minLen = 2000,
@maxLen = 4000,
@seed = @seed OUTPUT,
@string = @str OUTPUT;
IF (@i % 2000 = 0)
INSERT INTO @t_tsql (run_count, duration_ms)
SELECT @i, DATEDIFF(ms, @start_time, GETDATE());
SET @i = @i + 1;
END
SET @i = 1;
SET @start_time = GETDATE();
-- Run CLR solution 20,000 times
WHILE @i <= 20000
BEGIN
SET @seed = @i;
SELECT @str = dbo.ucf_random_string(2000, 4000, @seed);
IF (@i % 2000 = 0)
INSERT INTO @t_clr (run_count, duration_ms)
SELECT @i, DATEDIFF(ms, @start_time, GETDATE());
SET @i = @i + 1;
END
SELECT t1.run_count, tsql_duration = t1.duration_ms, clr_duration = t2.duration_ms
FROM @t_tsql t1
INNER JOIN @t_clr t2 ON t1.run_count = t2.run_count;Conclusion:
Generating random strings in SQL Server can be achieved using CLR functions, which provide better performance and flexibility compared to pure T-SQL solutions. By using CLR functions, you can easily generate random strings with specific length and patterns, ensuring repeatability when needed.