In Azure Data Lake Analytics, U-SQL provides a powerful way to analyze unstructured or semi-structured data. In this article, we will dive into the concept of structured database objects in Azure Data Lake Analytics and learn how to create and use them.
Understanding U-SQL Data Definition Language (DDL)
U-SQL DDL supports various database objects such as schemas, tables, indexes, statistics, views, functions, packages, procedures, assemblies, credentials, types, partitions, and data sources. These objects provide a mechanism to organize and manage data in Azure Data Lake Analytics.
Similar to SQL Server, U-SQL databases have a default schema named “dbo” and you can create additional schemas as needed. Tables in U-SQL can be either managed or external, depending on whether the data is stored natively or externally in an external data repository.
Let’s take a look at an example of creating a managed table in U-SQL:
CREATE TABLE Sales (
ProductId int,
ProductName string,
Quantity int,
Price decimal
);
In this example, we create a table named “Sales” with four columns: ProductId, ProductName, Quantity, and Price. This table will be used to store structured data for further analysis.
Loading Data into U-SQL Tables
Once we have created a table, we can load data into it using the INSERT command. The data can be extracted from files stored in the data lake storage account.
Here’s an example of loading data into the “Sales” table:
INSERT INTO Sales
SELECT ProductId, ProductName, Quantity, Price
FROM @input
WHERE Price > 0;
In this example, we use the SELECT statement to extract data from a source called “@input” and insert it into the “Sales” table. We can also apply filters or transformations to the data before loading it into the table.
Querying Data from U-SQL Tables
Once the data is stored in a structured table format, we can easily query it using the SELECT statement. U-SQL provides a SQL-like syntax for querying data.
Here’s an example of querying data from the “Sales” table:
SELECT ProductName, SUM(Quantity) AS TotalQuantity
FROM Sales
GROUP BY ProductName
ORDER BY TotalQuantity DESC;
In this example, we group the data by the “ProductName” column and calculate the total quantity for each product. We then order the results by the total quantity in descending order.
Exporting Data from U-SQL Tables
If we need to export the queried data, we can use the OUTPUT command to write the results to a file in the Azure Data Lake storage account.
Here’s an example of exporting data from the “Sales” table:
OUTPUT @output
TO "/output/SalesReport.csv"
USING Outputters.Csv();
In this example, we specify the output location as “@output” and define the output format as CSV. The results will be written to a file named “SalesReport.csv” in the “/output” directory of the data lake storage account.
Conclusion
In this article, we explored the concept of structured database objects in Azure Data Lake Analytics and learned how to create, load data into, query, and export data from U-SQL tables. These database objects provide a powerful way to organize and analyze unstructured or semi-structured data. By leveraging U-SQL’s SQL-like syntax, developers can easily perform complex data analysis tasks in Azure Data Lake Analytics.
Stay tuned for more articles on Azure Data Lake Analytics and SQL Server!