In today’s data-driven world, managing and extracting value from data are crucial capabilities. Two of the most common capabilities required for this are data cataloging and data warehousing. Data cataloging allows us to keep track of metadata and acts as a guiding beacon for data pipelines. On the other hand, data warehousing enables us to process large volumes of data efficiently for deriving insights.
Azure provides two powerful services for these capabilities – Azure Purview for data cataloging and governance, and Azure Synapse Analytics for data warehousing. In this article, we will explore how to integrate these two services to access data catalog assets hosted in Azure Purview from Azure Synapse.
Prerequisites
Before we begin, make sure you have the necessary privileges to administer and operate Azure Purview and Azure Synapse services on your Azure account. You will also need an instance of Azure Purview with some data repositories cataloged, as well as an instance of Azure Synapse Workspace.
Configuring Azure Purview for Integration
To integrate Azure Purview with Azure Synapse, open the Azure Synapse Studio and navigate to the Manage blade. Under the External connections section, you will find Azure Purview (Preview). Click on “Connect to a Purview account” and select your Purview account from the list. This will register the account with Azure Synapse and integrate it with Purview.
Once the integration is complete, you can access the Purview catalog from Synapse Studio. Navigate to the Data tab and select “Purview” from the search bar dropdown. Now you can search for data assets cataloged in Purview directly from Synapse Studio.
Exploring the Purview Catalog
When searching the Purview catalog from Synapse Studio, you can type a full or partial name of the database object you intend to search. The search results will show a list of database objects that match the criteria. These results are specific to the Purview account instance and not the Synapse pools.
Clicking on an item in the search results will display detailed information about the data asset. You can explore its schema, lineage, data classification, related database objects, and more. The related tab can also help you find similar or related database objects.
Taking Actions in Synapse Studio
Once you have discovered a data asset of interest, you can take corresponding actions in Synapse Studio. This includes creating a linked service, integration dataset, or a new data flow to source the data from the targeted object. The Connect and Develop menu items provide links to initiate these actions.
The benefit of integrating Azure Purview with Azure Synapse is that you can access the catalog right within the operational console of a data warehousing environment. This eliminates the need to switch between services and simplifies the data sourcing process.
Conclusion
In this article, we learned how to integrate Azure Purview and Azure Synapse Analytics. By cataloging data in Purview and integrating it with Synapse, we can access data assets from the catalog directly in Synapse Studio. This integration streamlines the data warehousing process and provides a convenient way to manage and source data within a single environment.