Introduction:
As a SQL Server developer, you may come across scenarios where you need to process JSON files and transform the data into CSV files. Azure Data Factory provides data flow formatters transformations to process the data in the pipeline. In this article, we will discuss three important formatters: Flatten, Parse, and Stringify.
Flatten Transformation
The flatten transformation is used to convert array values inside hierarchical structures, such as JSON, into individual rows. Let’s consider an example where we have a JSON file with columns like id, firstname, lastname, gender, age, and address. The address column is a complex type that contains streetAddress, city, and state.
To implement the flatten transformation, we can create a new data flow in Azure Data Factory and add a JSON source to the file. In the source settings, we select “Array of documents” as the JSON contains an array type for the address field. We can then preview the source data and add the flatten transformation, selecting the address column in the “Unroll by” section. By resetting the schema, we can add the address columns (streetAddress, city, and state). Finally, we add a sink to generate the CSV file in the desired location.
Parse Transformation
The parse transformation is used to extract specific data from text sources like JSON, XML, and delimited text. Let’s consider an example where we have an Excel file with columns like Car_Id, Model, Colour, and json_value. We want to extract locationid and region from the JSON value using the parse transformation.
To implement the parse transformation, we can create a new data flow in Azure Data Factory and add an Excel source dataset. We specify the Excel source file path and then add the parse transformation. In the parse transformation settings, we select JSON as the format and add new columns for locationid and region. Using the visual expression builder, we can convert these columns to string values. Finally, we add a sink to generate the output to a CSV file.
Stringify Transformation
The stringify transformation is used to transform complex data types into string data types. Let’s consider an example where we have a JSON file with a complex data type in the address field.
To implement the stringify transformation, we can create a new data flow in Azure Data Factory and add a JSON source. We preview the JSON data and then add the stringify transformation, selecting the address field. We need to select the complex field in the expression builder. Finally, we add a sink to generate the output data in a CSV file.
Conclusion
In this article, we discussed three important data flow formatters: Flatten, Parse, and Stringify. These transformations are useful when you need to process and transform string data in your pipeline. Whether you need to convert array values into individual rows, extract specific data from text sources, or transform complex data types into string values, these formatters provide the necessary tools to accomplish these tasks efficiently.