How to Move On-Premises SQL Server Data to Azure Data Lake Storage

Many businesses today are looking to leverage Azure’s Data Services to build an E-L-T process for moving data from their on-premises SQL Server to Azure Data Lake Storage. In this article, we will explore how to solve this problem using Azure Data Factory.

Prerequisite Resources

Before we begin, let’s make sure we have the necessary resources set up:

Azure Data Factory V2: This will be used as the ETL orchestration tool.
Azure Data Lake Store Gen2: The data will be stored in ADLS2.
On-Premises SQL Server: The source data will be stored in an on-premises SQL Server.
Azure SQL Database (Standard): We will use Azure SQL Database to store pipeline parameter values and tables.
Azure Data Factory Self Hosted IR: This will allow us to link our on-premises resources to Azure.

Prepare and Verify SQL Server Database Objects

First, we need to navigate to SQL Server Management Studio and connect to the on-premises SQL Server containing the databases we want to move. We should also verify that there are tables in these databases.

Prepare and Verify Azure SQL Database Objects

Next, we will create a pipeline parameter table in our Azure SQL Database to store the table names, catalog names, and process flags. This table will drive the pipeline configurations at runtime.

Prepare Azure Data Lake Store Gen2 Container

We also need to create an ADLS2 container and folder for our server. We should confirm that there is no existing data in the server level folder.

Create Azure Data Factory Pipeline Resources

Now we are ready to begin creating our Azure Data Factory Pipeline. We will navigate to our Azure Data Factory resource and click on “Author and monitor”. Once the Azure Data Factory canvas loads, we will click on “Create Pipeline”.

Create Self Hosted IR

We will create and verify that our Self Hosted Integration Runtime is created and in “Running” status. This will allow us to connect our on-premises resources to Azure.

Create Linked Services

Once our self-hosted IR is created, we will create all the required Linked Services, including SQL Server, Azure SQL Database, and Azure Data Lake Store Gen 2.

Create Datasets

Our pipelines will require three datasets:

DS_ADLS2: This dataset will be configured as a Parquet format and will allow us to partition our data by YY-MM-DD-HH.
DS_SQLSERVER: This dataset will connect to our on-premises SQL Server.
DS_ASQLDB_PIPEINE_PARAMETER: This dataset will connect to our pipeline parameter table in Azure SQL Database.

Create Azure Data Factory Pipelines

Now we can start creating our Azure Data Factory Pipelines:

P_Insert_Base_Table_Info

This pipeline will query the on-premises information_Schema.tables as its source to get the Table Name and Database name, and will then output the results to a basic parameters table in Azure SQL Database. The purpose of this pipeline and process is to use this pipeline parameter table to drive the rest of our pipelines.

P_SQL_to_ADLS

This pipeline will create the SQL Server to ADLS data orchestration. It will use a Lookup activity to retrieve the pipeline parameter table, and a Foreach activity to iterate through the tables. Within the Foreach activity, a copy activity will be used to copy the data from the on-premises SQL Server to ADLS2.

Run the Data Factory Pipeline & Verify Azure Data Lake Store Gen 2 Objects

Finally, we can run the pipeline P_SQL_to_ADLS. After running the pipeline, we can verify that the data has been successfully moved to Azure Data Lake Storage. We should see the appropriate database-level and table-level folders in ADLS2.

By following these steps, businesses can easily move their on-premises SQL Server data to Azure Data Lake Storage using Azure Data Factory. This allows for a structured, partitioned, and repeatable process for all servers in their SQL Server ecosystem.

Click to rate this post!

[Total: 0 Average: 0]

Comprehensive 360 Degree Assessment

Data Replication

Performance Optimization

Data Security

Database Migration

Expert Consultation

Cloud Migration Made Easy

Considering a move to the cloud? Axial SQL brings you proven migration strategies to streamline your transition. Our expert team ensures a smooth, efficient shift, keeping your data safe and accessible. Start your journey to the cloud with confidence!

SQL Performance Optimization

Is your SQL running slower than expected? Don't let sluggish performance hinder your business. Our optimization experts at Axial SQL specialize in tuning your databases for peak performance. Speed up your SQL and supercharge your data processing today!

Database Stability Solutions

Tired of frequent database outages? Discover stability with Axial SQL! Our comprehensive analysis identifies and resolves your database vulnerabilities. Enhance reliability, reduce downtime, and keep your operations running smoothly with our expert guidance.

Expert Database Team Evaluation

Questioning your database team's efficiency? Let Axial SQL provide an expert, unbiased analysis. We assess your team's strategies and workflows, offering insights and improvements to boost productivity. Elevate your database management to new heights!

Data Security Assurance

Concerned about your database security? Axial SQL is here to fortify your data defenses. Our specialized security assessments identify potential risks and implement robust protections. Keep your sensitive data secure and your peace of mind intact with our expert services.

Published on