Azure Data Factory is a powerful tool for orchestrating and managing data workflows in the cloud. In previous articles, we discussed how to create pipelines with multiple activities and how to schedule pipeline executions using triggers. In this article, we will explore how to control the dependencies between activities and pipeline runs in Azure Data Factory.
Dependency between Activities
By default, activities in Azure Data Factory are executed sequentially, meaning that the next activity will not be executed until the previous activity is executed successfully. However, you can control the dependency between activities using different options:
- Success: The next activity will be executed if the current activity succeeds.
- Failure: The next activity will be executed if the current activity fails.
- Completion: The next activity will be executed regardless of the result of the current activity.
- Skip: The next activity will be skipped and not executed if the current activity is skipped.
To configure the dependency between activities, you can use the icons available in the Azure Data Factory interface. For example, you can connect an activity to the next activity using a green arrow to indicate a successful execution, or a red arrow to indicate a failed execution.
Example
Let’s consider a scenario where we have a pipeline that checks the existence of a file in an Azure Storage Account using the Get Metadata activity. If the file exists, the pipeline will execute a Copy activity to move the file to another storage account. If the file does not exist, the pipeline will execute a Lookup activity to search for the file in a different storage account.
We can configure the dependencies as follows:
- The Get Metadata activity is connected to the Copy activity with a green arrow, indicating that the Copy activity should only be executed if the Get Metadata activity succeeds.
- The Get Metadata activity is connected to the Lookup activity with a red arrow, indicating that the Lookup activity should only be executed if the Get Metadata activity fails.
- The Get Metadata activity is connected to a Stored Procedure execution activity with a blue arrow, indicating that the Stored Procedure should be executed regardless of the result of the Get Metadata activity.
- The Copy activity is connected to another Stored Procedure execution activity with a grey arrow, indicating that the Stored Procedure should be executed if the Copy activity is skipped.
By configuring these dependencies, we can ensure that the pipeline executes the appropriate actions based on the result of each activity.
Tumbling Window Trigger Dependency
In addition to controlling dependencies between activities, Azure Data Factory also allows you to configure dependencies between tumbling windows in a Tumbling Window trigger. A Tumbling Window trigger consists of a series of fixed-size, non-overlapping, and contiguous time intervals that are fired at a periodic time interval.
With Tumbling Window dependency, you can ensure that the preceding window is completed successfully before proceeding with the next window. This can be a self-dependency, where the dependency is on the preceding windows in the same trigger, or a dependency on another Tumbling Window trigger.
To configure a Tumbling Window trigger dependency, you can specify the offset and size of the dependency window. The offset is a positive or negative timespan value used as an offset for the dependency trigger, while the size indicates the size of the dependency tumbling window.
Example
Let’s say we have a Tumbling Window trigger that fires every day at 8:00 AM. We can configure a self-dependency on this trigger, specifying an offset of 1 day and a size of 1 day. This means that each window should be completed successfully before the next window can start.
By configuring this dependency, we can ensure that the pipeline only runs when the previous window has completed successfully, providing a reliable and consistent data workflow.
Conclusion
In this article, we explored how to configure dependencies between activities and pipeline runs in Azure Data Factory. By controlling the dependencies, you can ensure that your data workflows are executed in the desired order and with the appropriate actions based on the result of each activity. Stay tuned for more articles in this series!