In this article, we will dive into the concept of SQL Server PolyBase external tables with Azure Blob Storage. PolyBase is a feature available in SQL Server 2016 and onwards that allows processing data using native SQL queries from external data sources. It is a database scoped configuration, which sets it apart from the popular Linked server feature.
PolyBase supports various data sources for external tables, including SQL Server, Oracle, MongoDB, Hadoop, Azure Blob Storage, Teradata, ODBC, and Generic types. In this article, we will focus on using Azure Blob Storage as the external data source.
To get started, we need to configure the PolyBase feature in SQL Server. If you are using SQL Server 2019, you can enable the PolyBase feature by running the following command:
EXEC sp_configure 'polybase enabled', 1;
GOAfter enabling the PolyBase feature, you need to set the connectivity to support Windows Azure Blob Storage (WASB driver) by running the following command:
EXEC sp_configure @configname = 'hadoop connectivity', @configvalue = 7;
GONext, you will need to obtain the access key for your Azure Storage Account. This can be done by navigating to the Storage account page, selecting Settings, and then Access keys. Make sure to note down the storage account name, container name, and access key.
With the necessary configurations in place, you can now create a new SQL database and the required master key and credential. The master key is used for encryption, and the credential is used to store the Azure Storage account access key. Here is an example script:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'test@123';
USE YourDatabaseName;
CREATE DATABASE SCOPED CREDENTIAL [AzurePolyBase] WITH IDENTITY = 'TestAccount', SECRET = 'YourAccessKey';
Now that we have everything set up, we can proceed to create an external data source that points to the Azure Blob storage. Here is an example script:
CREATE EXTERNAL DATA SOURCE AzureStorage WITH (
TYPE = HADOOP,
LOCATION = 'wasbs://yourcontainer@yourstorageaccount.blob.core.windows.net',
CREDENTIAL = [AzurePolyBase]
);
After creating the external data source, we can define an external file format. This format specifies how the data in the external file is structured. Here is an example script:
CREATE EXTERNAL FILE FORMAT TextFile WITH (
FORMAT_TYPE = DelimitedText,
FORMAT_OPTIONS (FIELD_TERMINATOR = ',')
);
Finally, we can create an external table that represents the data in the Azure Blob storage. The external table syntax is similar to a regular SQL table, where we define the columns and their data types. Here is an example script:
CREATE EXTERNAL TABLE dbo.SQLExternal (
[Region] varchar(1000),
[Country] varchar(1000),
[Item Type] varchar(1000),
[Sales] varchar(1000)
) WITH (
LOCATION = '/Sales.txt',
DATA_SOURCE = AzureStorage,
FILE_FORMAT = TextFile
);
Once the external table is created, you can retrieve data from it using regular SQL queries. The data will be fetched from the Azure Blob storage and displayed in a tabular format. You can also view the execution plan, which will show that it is a remote query.
In addition to fetching data, you can also use PolyBase to insert data from an on-premises SQL Server into the Azure Blob storage. This can be useful for scenarios like data archival. To enable PolyBase export, you can run the following command:
EXEC sp_configure 'allow polybase export', 1;
Then, you can create an external table with the required columns and their data types, and insert data from the on-premises database into the external table.
In conclusion, SQL Server PolyBase with Azure Blob Storage provides a powerful way to process and retrieve data from external sources using native SQL queries. By leveraging the capabilities of Azure Blob Storage, you can easily integrate and analyze data from various sources within your SQL Server environment.