Published on

November 7, 2021

Using Metadata-Driven Copy Tasks in SQL Server

When working with Azure Data Factory pipelines, it is often necessary to copy data from a large number of objects, such as tables. Manually implementing this type of pipeline can be a difficult task. However, thanks to the ADF metadata-driven copy task, creating a metadata-based pipeline has become much easier.

Pipeline Development

For this demonstration, we will use a flat file as the source, which is located in a data lake storage Gen 2 folder. The destination will be an Azure SQL database table.

On the ADF authoring page, select the “Ingest” option and then choose the metadata-driven copy task. This will open the properties page where you can provide the dataset and control table name for storing the metadata information.

Next, set up the source data store by adding the data lake connections and specifying the folder path. You can also configure the file format settings by clicking on the “Detect Text Format” button. All required settings will be automatically populated, and you can preview the source data.

On the target page, provide the schema name and table name for the destination dataset. The table will be auto-created, or you can choose an existing table from the destination connection.

In the mapping page, the source and destination columns will be automatically mapped if they have the same name. If the column names are different, you will need to manually map them.

Optionally, you can rename the copy data task and provide task descriptions and other settings options. Review the source and destination configurations on the summary page.

Once the deployment process is completed, the datasets and pipelines will be created, and a control table SQL script will be generated. Download the SQL script and execute it in SSMS to create the control table where all the metadata will be stored.

Now you can run the pipeline, and the destination table will be created with the data populated in it. If you want to copy data from another object using this pipeline, simply add the objects to the control table. The pipeline will read the details from the control table.

Limitations

There are a few limitations to consider when developing metadata-driven pipelines:

  • The copy data tools for metadata-driven pipelines do not currently support incremental data copying from files. You may need to create a custom pipeline to achieve this.
  • IR name, database type, and file format type cannot be parameterized in Azure Data Factory.
  • The SQL script generated by the copy data tool for the control table uses OPENJSON. If you are using a SQL Server to create the control table, it must be SQL Server 2016 or a higher version that supports the OPENJSON functions.

Conclusion

In this article, we discussed how to use metadata-driven copy tasks to create a metadata-driven pipeline for easily copying a large number of objects. By leveraging the power of Azure Data Factory, developers can streamline the data copying process and save time and effort.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.