Introduction
When working in Azure Data Factory, it is often necessary to retrieve metadata information from files, such as file name, file size, and file existence. This metadata can then be used in subsequent activities within the pipeline. In this article, we will explore an example of how to gather metadata from files and use it in a pipeline.
Solution
Let’s consider a scenario where we want to retrieve the metadata information of a file stored in a data lake folder and insert that metadata into a SQL table using a stored procedure. To begin, we will place a file called Source1.csv in the data lake folder.
Pipeline Design
In our pipeline, we will use a Get Metadata activity and a Stored Procedure activity, which will be linked together. The Get Metadata activity will retrieve the metadata information from the data set, and the Stored Procedure activity will insert that metadata into a SQL table.
Metadata Activity Field List
The Get Metadata activity provides a field list with various options:
- Column Count: Total number of columns in the file or table.
- Content MD5: MD5 of the file.
- Exists: Checks if a file, folder, or table exists and returns true or false.
- Item Name: Name of a file or folder.
- Item Type: Returns “File” if the source is a file, and “Folder” if it is a folder.
- Last Modified: Returns the date and time of the last modified file or folder.
- Size: Returns the size of the file in bytes.
- Structure: Returns a list of column names and types for tables and files.
Storing Metadata
We will create a table called ITEM_META to store the metadata information. The metadata will be inserted into the LINE column of this table.
Execute the Pipeline
Once the pipeline is executed, the Get Metadata activity will retrieve the metadata information, and the Stored Procedure activity will insert that metadata into the ITEM_META table. The results can be checked in the Azure SQL database.
Conclusion
In this article, we have discussed the steps to work with the Metadata activity in Azure Data Factory. By using this activity, we can easily retrieve metadata information about the files being processed, such as existence, name, and size. This is a valuable feature that enhances the functionality of your pipeline.