Published on

November 28, 2024

Generating Random Strings in SQL Server

Problem:

In my work, I often need to replace sensitive information in SQL Server databases with random strings. This is especially useful when restoring a production database to a test environment or when generating dummy data for development purposes. I have specific requirements for the random string generation:

  • I need to be able to define the length of the string within a range.
  • I want to be able to generate the exact same strings repeatedly if needed, to ensure consistent data quantity and quality.
  • I want to be able to generate random strings with simple patterns, such as the postal code format in Canada.

Solution:

There are several ways to generate random strings in T-SQL, such as using Rand(), NewID(), CRYPT_GEN_RANDOM(), and Convert/Cast/Substring functions. However, using pure T-SQL has some disadvantages:

  1. Non-deterministic functions like Rand(), NewID(), and CRYPT_GEN_RANDOM() cannot be used inside a UDF (User-Defined Function), requiring an additional layer to bypass this limitation.
  2. The performance of pure T-SQL solutions is compromised for heavy-load string generation.

To overcome these limitations, a CLR (Common Language Runtime) function can be used. CLR functions are better positioned for string manipulation inside SQL Server. Here, I will provide two CLR functions that meet the requirements mentioned above:

  1. fn_random_string: This function generates a random string with a specified length and seed parameter, ensuring repeatability with the same random string when using the same seed.
  2. fn_random_pattern: This function generates a random string with a simple pattern defined by the pattern parameter. It also has a seed parameter to ensure repeatable string generations.

Here is an example of how to create and use these CLR functions:

CREATE ASSEMBLY clr_random_string
FROM 'C:\mssqltips\Random_String\bin\CLR_Rand_String.dll'
WITH PERMISSION_SET = SAFE;

CREATE FUNCTION dbo.ucf_random_string (@minLen INT, @maxLen INT, @seed INT = 0)
RETURNS NVARCHAR(MAX) WITH EXECUTE AS CALLER
AS EXTERNAL NAME [clr_random_string].[UserDefinedFunctions].fn_random_string;

CREATE FUNCTION dbo.ucf_random_pattern (@pattern NVARCHAR(MAX), @seed INT = 0)
RETURNS NVARCHAR(MAX) WITH EXECUTE AS CALLER
AS EXTERNAL NAME [clr_random_string].[UserDefinedFunctions].fn_random_pattern;

-- Generate a single random string
SELECT RandStr = dbo.ucf_random_string(10, 30, DEFAULT),
       RandPattern = dbo.ucf_random_pattern('What is the time, Mr. @@@@@@? It is ! am', DEFAULT);

-- Generate random Canada Post Code / US zip code
SELECT TOP 10 Canada_PostCode = UPPER(dbo.ucf_random_pattern('@!@ !@!', ROW_NUMBER() OVER (ORDER BY column_id))),
             US_ZipCode = dbo.ucf_random_pattern('!!!!!-!!!!', CEILING(RAND(column_id)) + ROW_NUMBER() OVER (ORDER BY column_id))
FROM sys.all_columns;

Performance Comparison between CLR and T-SQL:

It is important to compare the performance of CLR and T-SQL solutions. Here is an example of how to test the performance:

DECLARE @i INT = 1, @start_time DATETIME = GETDATE();
DECLARE @str VARCHAR(8000), @seed INT;
DECLARE @t_tsql TABLE (run_count INT, duration_ms INT); -- for T-SQL execution stats
DECLARE @t_clr TABLE (run_count INT, duration_ms INT); -- for CLR execution stats

-- Run T-SQL solution 20,000 times
WHILE @i <= 20000
BEGIN
    SET @seed = @i;
    EXEC dbo.usp_generateIdentifier
        @minLen = 2000,
        @maxLen = 4000,
        @seed = @seed OUTPUT,
        @string = @str OUTPUT;

    IF (@i % 2000 = 0)
        INSERT INTO @t_tsql (run_count, duration_ms)
        SELECT @i, DATEDIFF(ms, @start_time, GETDATE());

    SET @i = @i + 1;
END

SET @i = 1;
SET @start_time = GETDATE();

-- Run CLR solution 20,000 times
WHILE @i <= 20000
BEGIN
    SET @seed = @i;
    SELECT @str = dbo.ucf_random_string(2000, 4000, @seed);

    IF (@i % 2000 = 0)
        INSERT INTO @t_clr (run_count, duration_ms)
        SELECT @i, DATEDIFF(ms, @start_time, GETDATE());

    SET @i = @i + 1;
END

SELECT t1.run_count, tsql_duration = t1.duration_ms, clr_duration = t2.duration_ms
FROM @t_tsql t1
INNER JOIN @t_clr t2 ON t1.run_count = t2.run_count;

Conclusion:

Generating random strings in SQL Server can be achieved using CLR functions, which provide better performance and flexibility compared to pure T-SQL solutions. By using CLR functions, you can easily generate random strings with specific length and patterns, ensuring repeatability when needed.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.