In this article, we will explore how to connect Azure SQL and Google BigQuery in SQL Server. By linking these two cloud resources, we can extract useful insights from different datasets hosted across different data products.
Setting up Azure SQL
To begin, we need to set up an Azure SQL database. In the SQL Shack article “Create an Azure SQL Database with built-in sample data,” we can find detailed instructions on how to do this. Create a new Azure SQL database named “BigQueryFrontEnd” and build a table called “BQ_SP_Result_Set” using the provided script.
USE [BigQueryFrontEnd]
GO
CREATE TABLE [dbo].[BQ_SP_Result_Set] (
[ZIP] [varchar](500) NULL,
[NAME] [varchar](500) NULL,
[EMPFLAG] [varchar](500) NULL,
[EMP_NF] [varchar](500) NULL,
[EMP] [int] NULL,
[QP1_NF] [varchar](500) NULL,
[QP1] [int] NULL,
[AP_NF] [varchar](500) NULL,
[AP] [int] NULL,
[EST] [int] NULL,
[CITY] [varchar](500) NULL,
[STABBR] [varchar](500) NULL,
[CTY_NAME] [varchar](500) NULL
)
GO
Creating an Azure Data Factory
Next, we will create an Azure Data Factory that will link the Azure SQL and Google BigQuery resources. To do this, follow these steps:
- Set up an Azure account.
- Go to portal.azure.com/#home and click on “Data factories” to open the Data factories blade.
- On the Basics tab, select the subscription and resource group values, and set the Instance details.
- Configure Git and choose the Public endpoint Connectivity method.
- Click through the Tags tab.
- Click “Create” on the last tab.
Once the Data Factory is created, click on “Author & Monitor” to open the Data Factory definition page.
Building Datasets
In order to connect the Azure SQL and Google BigQuery resources, we need to create datasets for each of them. A dataset is a named view of data that references the data we want to use in our activities.
First, we will build a dataset for the BigQuery resource:
- On the Data Factory definition page, click on “New dataset”.
- Search for and select “bigquery” and click “Continue”.
- Create a linked service for the dataset by clicking on “New”.
- Provide the necessary values, such as the Project ID, Client ID, Client Secret, and Refresh Token.
- Test the connection and click “Create”.
- Select the desired table from the dropdown and click “Preview data” to see a sample of the BigQuery resource data.
- Name the dataset “BQDemoBigQueryDataset” and add a description.
Next, we will create a dataset for the Azure SQL resource:
- On the Data Factory definition page, click on “New dataset”.
- Name the dataset “BQDemoAzureSQLDataset” and click “New” to create a linked service.
- Provide the necessary values, such as the Server name, Database name, User name, and Password.
- Test the connection and click “Create”.
- Select the desired table from the dropdown and set the description.
Building the Pipeline
Now that we have our datasets, we can build the Azure pipeline that will move the data between the Azure SQL and Google BigQuery resources:
- On the Data Factory definition page, click on “New pipeline”.
- In the Activities pane, click and drag “Copy data” to the next pane.
- Set the name and description for the pipeline.
- Configure the source dataset to be the “BQDemoBigQueryDataset” and choose the desired table.
- Configure the sink dataset to be the “BQDemoAzureSQLDataset” and choose “None” for the stored procedure name.
- Import the schemas and preview the source data.
- Validate the pipeline and click “Publish all” to save it.
At this point, we have successfully set up the Azure Data Factory and created the pipeline to move the data between the Azure SQL and Google BigQuery resources.
Testing the Pipeline
To test the pipeline and verify that the data is being transferred correctly, we can run some queries on the target Azure SQL database:
SELECT COUNT(ZIP) AS 'ZIP_CODE_RECORD_COUNT' FROM [dbo].[BQ_SP_Result_Set]
SELECT * FROM [dbo].[BQ_SP_Result_Set] ORDER BY ZIP
These queries will provide us with information about the number of records in the table and display the data in ascending order of ZIP code.
Conclusion
By following the steps outlined in this article, we have successfully connected Azure SQL and Google BigQuery in SQL Server. Although we currently cannot call a BigQuery stored procedure from an Azure SQL resource, we can expect future product enhancements to enable this functionality. In the meantime, these techniques allow us to link and extract insights from different datasets hosted across different data products.
Table of contents:
- Build a Google BigQuery Resource
- Link Google BigQuery resources into an Azure SQL resource
- Build a Google BigQuery report with Google Data Studio