Published on

February 3, 2021

Understanding U-SQL Scripts and Executing Them Using Visual Studio

In this article, we will explore U-SQL scripts and how to execute them using Visual Studio. U-SQL is a big data query language that combines the syntax of t-SQL and the power of C# language. It allows you to extract and transform data in the required format, offering big data processing from gigabyte to petabyte scale.

To get started, you need to have Visual Studio 2019 installed with the Azure Data Lake and Stream Analytics Tools. You can install these tools by navigating to Visual Studio Installer -> Workloads -> Data Storage and processing -> Azure Data lake and Stream Analytics.

Once you have the necessary tools installed, you can create a new U-SQL project in Visual Studio. This project will contain your U-SQL scripts. Specify the project name, location, and solution, and Visual Studio will create the project for you.

In the project, you will find a script editor where you can write your U-SQL scripts. Before writing the script, you need to set the input and output folders path in Visual Studio. Navigate to Tools -> Options -> Azure Data Lake and specify the default path for Data Root and runtime SDK directory. You can also configure additional settings such as file preview and job graph.

Now, you can start writing your U-SQL script. The script begins with defining a row-set variable and specifying the input column names and their data types. You can use extractors like Csv, Text, or Tsv to retrieve data from different formats. For example:

@Sales = EXTRACT [Region] string, [Country] string, [ItemType] string, [SalesChannel] string, [OrderPriority] string, [OrderDate] DateTime, [OrderID] int, [ShipDate] DateTime, [UnitsSold] int, [UnitPrice] decimal, [UnitCost] decimal, [TotalRevenue] decimal, [TotalCost] decimal, [TotalProfit] decimal
FROM "/Input/SalesRecords.csv"
USING Extractors.Csv(skipFirstNRows: 1);

After extracting the data, you can perform data transformation operations using functions similar to T-SQL. For example, you can calculate averages, sums, or group data by a specific column. Here’s an example:

@TransformData = SELECT Country, AVG([UnitsSold]) AS AverageUnitSold, SUM([TotalRevenue]) AS SumofTotalRevenue, SUM([TotalCost]) AS SumofTotalCost, SUM([TotalProfit]) AS SumofTotalProfit
FROM @Sales
GROUP BY Country;

Once you have transformed the data, you can save it into a specified output file using the OUTPUT command. You can also specify the order of the data and format the output file. Here’s an example:

OUTPUT @TransformData TO "output/SalesData.csv" ORDER BY Country USING Outputters.Csv(outputHeader: true);

To execute the U-SQL script, click on the “Submit” button in Visual Studio. This will open a command prompt and provide information about the execution. You can also view the job graph and preview the output file.

If you want to process multiple files together, you can specify the file paths in the script. For example:

FROM "/Input/First/Sales1.csv", "/Input/Second/Sales2.csv"

Visual Studio also allows you to integrate with your Azure account. You can connect to your Azure Data Lake Analytics account and deploy your U-SQL project in Azure. This allows you to scale your data processing and take advantage of the Azure cloud infrastructure.

In conclusion, Visual Studio provides a powerful environment for writing and executing U-SQL scripts for Azure Data Lake Analytics. It allows you to extract, transform, and process big data efficiently. By integrating with Azure, you can take advantage of the scalability and resources offered by the cloud. Stay tuned for more articles on U-SQL and Azure Data Lake Analytics!

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.