There may come a time when you are faced with the challenge of processing a non-standard input file format in SQL Server. These files can be quite tricky to handle, especially when they contain metadata rows at both the start and the end of the file. In this article, we will explore a simple and efficient approach to tackle this problem.
Let’s consider an example of a feed file with the following structure:
Col1 Col2 Col3 xxx start x y z x y z x y z xxx end
In this file, the “x,y,z” records are the ones of importance, along with the column headers. The challenge arises from the fact that the metadata rows do not follow the same format as the data rows. The metadata rows have only one column, “xxx start” and “xxx end”, without any comma separators. On the other hand, the data rows are defined as “x,y,z”.
There are various technical solutions available to handle this issue, such as using a script task or a conditional split. However, in this article, we will focus on a simple and code-free approach that ensures the package remains easily maintainable.
Here’s how we can tackle this problem:
- Import the entire file into a staging table using a Flat File source with “Ragged right” format. Define a single column that is 8000 characters long. The name of this column is irrelevant as it is just a placeholder. Import the data into a staging table with a single column defined as varchar(8000). This ensures that the entire file is imported successfully, including the problematic rows.
- Remove the extra rows from the staging table. In this case, a simple delete statement with a where clause can be used, as the length of the problem rows is constant and shorter than that of a normal data row. If needed, you can filter the rows using column delimiters or try using an identity column on the table to identify the problem rows.
- Export the remaining rows into a temporary file. Define a Flat File destination with “Ragged right” format, again with a single column. Use the staging table as the source and export the data from it to the staging Flat File.
- Finally, treat the staging file as a standard Flat File and import it as usual in another data flow task. This allows the designer to determine the column names from the source file. Although this adds a couple of data-flow tasks to the package, the process remains straightforward and transparent.
By following this approach, you can encapsulate the entire process in a single package without the need for external scripts or additional dependencies. This ensures that the package is easily maintainable and can be deployed as a standalone dtsx file when moving between environments.
In conclusion, processing non-standard input files in SQL Server can be challenging, but with the right approach, it can be simplified and made more efficient. By using a staging table and a simple data flow process, you can handle files with metadata rows at both the start and the end, while still allowing the designer to determine the column names from the source file. This approach ensures a clean and maintainable solution for processing non-standard input files in SQL Server.