Azure Data Factory is a powerful tool for managing and orchestrating data workflows in the cloud. However, when working with multiple environments and teams, it becomes essential to maintain version control and streamline the deployment process. In this article, we will explore how to configure version control and deployment in Azure Data Factory using Azure DevOps.
Step 1: Configuring the Data Factory Repository
The first step is to configure the Data Factory repository in Azure DevOps. This allows us to store and manage our Data Factory resources, such as pipelines, datasets, and linked services, in a version control system. To do this, follow these steps:
- Go to the Manage tab of the Azure Data Factory Studio and select the GIT Configuration menu item.
- In the right panel, press the Configure button.
- In the popup window, select the Repository type as Azure DevOps Git and choose the Azure Active Directory to be used for connecting the repository.
- Provide the required details for the Repository and click on the Apply button. Make sure to create a Collaboration branch to keep all the code related to the Data Factory.
- The GIT Configuration details are now available in the Manage tab of the Data Factory Studio. You can use the Edit button to disable publishing the Data Factory changes to the Publish branch.
Step 2: Publishing Changes to the Repository
Once the repository is configured, you can start creating or modifying your Data Factory resources. This includes pipelines, datasets, linked services, and triggers. After making the necessary changes, follow these steps to publish them to the repository:
- Create a new Data Factory Pipeline or modify an existing one.
- Press the Publish button in the top menu.
- Go to Azure DevOps and select the Collaboration branch. Inside the adf folder, you will find all the resources in their respective subfolders.
- The Publish branch contains a folder named after your Dev Data Factory name. Inside this folder, you will find ARM Templates for the Data Factory and its parameters in JSON format.
- Create copies of these template files with specific configuration values for different environments. For example, you can create separate template files for the test and production environments.
Step 3: Deploying Changes to Higher Environments
Now that the changes are published to the repository, we can deploy them to higher environments, such as the Test and Production environments. To do this, follow these steps:
- In Azure DevOps, go to the Pipelines menu and select Releases.
- Press the New button and select the New release pipeline option.
- Select the Stage template for the new release pipeline and set the name of the stage as DeployDevToTest.
- Add an artifact by selecting the source type as Azure Repos Git and providing the details about the repository.
- Click on the “1 job, 0 task” link of the DeployDevToTest stage and add the ARM template deployment task.
- Fill in the task details, such as the task version, Azure Subscription, resource group, location, template, and template parameters.
- Set the deployment mode as incremental, which is the default option.
- Save the pipeline and press the Create release button.
- In the pop-up window, select the DeployDevToTest stage to be executed manually and add any relevant comments in the Release description text box.
- Press the Create button to create the release.
- Go to the DeployDevToTest stage and press the Deploy button to start the deployment of the Data Factory code to the Test environment.
Conclusion
Configuring version control and deployment in Azure Data Factory using Azure DevOps is crucial for maintaining code integrity and streamlining the deployment process. By following the steps outlined in this article, you can ensure that your Data Factory resources are properly version controlled and deployed to higher environments with ease. Remember to create a new release every time there is a change in the codebase and update the other environments with the deployment of release pipelines.