Published on

May 29, 2018

Handling Comma Delimited Files in SQL Server

Processing comma delimited files (.csv) from external sources is a common task in SQL Server. There are various tools available, such as BCP, BULK IMPORT, and SSIS, that can be used for this purpose. In this article, we will explore how to handle comma delimited files using PowerShell and the SqlBulkCopy method.

Let’s start by looking at a PowerShell script that utilizes the SqlBulkCopy method to import data from a .csv file into a SQL Server table:

[void][Reflection.Assembly]::LoadWithPartialName("System.Data")
[void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")

$Date = $args[0]
$ConnectionString = $args[1]
$Directory = $args[2]
$Table = $args[3]
$file = "$Directory\$Table.csv"
$delim = ","
$FirstRowColumnNames = $true
$batchsize = 75000

$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($Connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
$bulkcopy.DestinationTableName = $Table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize

$datatable = New-Object System.Data.DataTable
$reader = New-Object System.IO.StreamReader($file)
$columns = (Get-Content $file -First 1).Split($csvdelimiter)

if ($FirstRowColumnNames -eq $true) {
    $null = $reader.readLine()
}

foreach ($column in $columns) {
    $null = $datatable.Columns.Add()
}

$i = 0

while (($line = $reader.ReadLine()) -ne $null) {
    $null = $datatable.Rows.Add($line.Split($delim))
    $i++

    if (($i % $batchsize) -eq 0) {
        $bulkcopy.WriteToServer($datatable)
        $datatable.Clear()
    }
}

if($datatable.Rows.Count -gt 0) {
    $bulkcopy.WriteToServer($datatable)
    $datatable.Clear()
}

$reader.Close()
$reader.Dispose()
$bulkcopy.Close()
$bulkcopy.Dispose()
$datatable.Dispose()

This script allows us to import several million rows from a .csv file into a SQL Server table in a matter of minutes, with minimal load on the server. However, there can be issues when the file contains a narrative or description field that includes commas within double quotes. For example:

ProductKey,Code,Description,Narrative,Comment
123,abc,product1,"Product reclassified, now freely available","reclassification by Bloggs, Bill"
345,fgh,product2,"Product withdrawn, issue with supplier","Product no longer available, issue logged re supplier by Soap, Joe"

These fields can cause the import process to break, resulting in errors about unexpected columns or end-of-row not in the expected location. To handle this, we need to isolate the commas within text between double quotes and ignore them during the import process.

One approach is to replace the commas meant to be column delimiters with another delimiter, such as a pipe (|). Here’s how we can modify the script to achieve this:

[void][Reflection.Assembly]::LoadWithPartialName("System.Data")
[void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")

$Date = $args[0]
$ConnectionString = $args[1]
$Directory = $args[2]
$Table = $args[3]
$file = "$Directory\$Table.csv"
$oldDelimiter = ","
$newDelimiter = "|"
$FirstRowColumnNames = $true
$batchsize = 75000

$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($Connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
$bulkcopy.DestinationTableName = $Table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize

$datatable = New-Object System.Data.DataTable
$reader = New-Object System.IO.StreamReader($file)
$columns = (Get-Content $file -First 1).Split($oldDelimiter)

if ($FirstRowColumnNames -eq $true) {
    $null = $reader.readLine()
}

foreach ($column in $columns) {
    $null = $datatable.Columns.Add()
}

$i = 0

while (($line = $reader.ReadLine()) -ne $null) {
    $sp = 0
    $point = 0
    [char[]]$larr = $line

    foreach ($item in $larr) {
        if($item -eq """") {
            $sp++
        }

        if($item -eq $oldDelimiter) {
            if ($sp%2 -eq 0) {
                $line = $line.Remove($point,1).Insert($point,$newDelimiter)
            }
        }

        $point++
    }

    $Line = $line.Replace("""","")
    $null = $datatable.Rows.Add($line.Split($newDelimiter))
    $i++

    if (($i % $batchsize) -eq 0) {
        $bulkcopy.WriteToServer($datatable)
        $datatable.Clear()
    }
}

if($datatable.Rows.Count -gt 0) {
    $bulkcopy.WriteToServer($datatable)
    $datatable.Clear()
}

$reader.Close()
$reader.Dispose()
$bulkcopy.Close()
$bulkcopy.Dispose()
$datatable.Dispose()

This modified script correctly identifies the commas that are meant to be column delimiters and those that are part of a text field. It replaces the commas with the new delimiter (pipe) only when they are not within double quotes. The unnecessary quotation marks are then cleaned up before adding the line to the datatable.

With this approach, we can handle comma delimited files with fields containing commas within double quotes, ensuring a smooth import process into SQL Server.

While this may not be the most elegant solution, it is effective and performs well. There may be other improved methods available, and we welcome any comments or feedback on this approach.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.