In today’s data-driven world, organizations often need to process large amounts of data efficiently and automatically. Azure Data Factory provides a powerful platform for orchestrating and automating data workflows. In this article, we will explore how to automatically execute a Data Factory Pipeline every time an email is received with a predefined subject and an attachment file.
Prerequisites
Before we dive into the step-by-step process, let’s take a look at the Azure resources we will be using:
- Azure Data Factory
- Azure Storage
- Azure Logic App
Step 1: Create a Logic App Workflow
The first step is to create a Logic App workflow that will be triggered whenever a new email arrives. In the Logic App Designer, add a trigger from the Office 365 Outlook category and configure it to match the desired email subject.
Step 2: Retrieve the Email Attachment
After the trigger, add a “For each” action to iterate over the attachments of the email. Use the attachment parameter from the trigger output to access the attachment details.
Step 3: Save the Attachment in Azure Storage
Inside the “For each” block, add a “Get Attachment” activity to retrieve the attachment content. Use the message ID and attachment ID parameters from the trigger output to specify the attachment to retrieve.
Next, add a “Create blob” activity to save the attachment in Azure Storage. Set the Storage Account connection and specify the folder path where the attachment will be saved. You can use dynamic content to provide a flexible folder path. Set the Blob name and Blob Content parameters using the output of the “Get Attachment” activity. Set the content type parameter to ensure only attachments of the desired type are copied.
Step 4: Create a Data Factory Pipeline
Create a new Azure Data Factory Pipeline that will be responsible for copying the attachment from the source folder to the destination folder. Add a Copy data activity to the pipeline and configure it to copy the attachment file.
Step 5: Configure the Trigger
Create a Storage events trigger for the Data Factory Pipeline. This trigger will start the pipeline whenever a new file is created in the source folder. Start the trigger if it is not already running.
Step 6: Test the Workflow
Send an email to the configured Outlook Office 365 email address with the matching subject line and the attachment file. Once the email reaches the mailbox, the Logic App workflow will be triggered. You can view the run details of the workflow from the Azure portal.
Step 7: Monitor the Pipeline Execution
After the workflow run is completed, the attachment file should be available in the destination folder. You can monitor the pipeline execution from the Azure Data Factory Studio’s Monitor tab. Check the trigger and pipeline runs to ensure successful processing.
Conclusion
By combining the power of Logic App workflows, Azure Data Factory Pipelines, and Azure Storage, we can automate the process of processing email attachments through a Data Factory Pipeline. This allows for efficient and reliable data processing when input data is sent via email. Monitoring the email inbox, workflow trigger run history, and Data Factory pipeline runs is crucial for ensuring end-to-end successful processing. With some modifications, this workflow can be adapted to handle periodic attachments and continue the automated file processing at regular intervals.