Published on

December 15, 2020

Executing U-SQL Jobs on Azure Data Lake Analytics

In this article, we will explore the process of executing U-SQL jobs on Azure Data Lake Analytics. U-SQL is a query language used for processing big data stored in Azure Data Lake Storage. By executing U-SQL jobs, we can perform various data processing tasks such as data transformation, aggregation, and analysis.

Introduction

Before we dive into the execution process, let’s understand the basic concept of U-SQL jobs. A U-SQL job is a set of U-SQL scripts that are executed on the Azure Data Lake Analytics account. These jobs can be developed and tested locally using Visual Studio IDE and local data. Once the job is ready, it can be deployed to the production environment on the cloud.

Deploying U-SQL Jobs

To deploy a U-SQL job on Azure Data Lake Analytics, we need to have a U-SQL script that is ready for execution. In the previous parts of this article series, we created a sample U-SQL application with pre-built scripts that we tested locally. Assuming you have this setup on your local machine, we can proceed with the deployment process.

By default, Visual Studio selects the local Azure Data Lake Analytics account for job execution. To execute the job on the cloud, we need to change the account to the Azure Data Lake Analytics account. This can be done by selecting the account from the ADLA Account dropdown in Visual Studio.

Before executing the job, it is important to consider the AU (Analytics Units) allocation. AU allocation determines the performance and cost of the job execution. For larger volumes of data, increasing the AU allocation can improve the execution speed. This setting can be modified in Visual Studio.

Once the configuration changes are made, we can submit the job for execution on the cloud account. The execution progress and status can be monitored through the output window in Visual Studio.

Analyzing Job Execution

After the job execution is complete, we can analyze the job execution statistics and performance metrics. Visual Studio provides a job graph that displays the state of the steps in the job and the overall execution flow. This graph can provide insights into the performance of the job.

In complex job scenarios, there may be multiple vertexes involved. The inter-vertex data flow and throughput details can be viewed to gain a deeper understanding of the job performance. These details can help optimize the job for better efficiency.

Additionally, the input and output files generated during the job execution can be accessed. These files are stored in the Azure Data Lake Storage account and can be viewed in Visual Studio. The file formats can vary depending on the job requirements.

Downloading Executed Jobs

In certain situations, it may be necessary to download an executed job from the Azure Data Lake Analytics account to a local environment. This can be done using the download button available in the job execution window. The downloaded files include the U-SQL script and its dependent files.

These downloaded files can be added to a version control repository for future reference. Visual Studio provides options to add the solution to version control, allowing for better collaboration and tracking of changes.

Conclusion

In this article, we explored the process of executing U-SQL jobs on Azure Data Lake Analytics. We learned how to deploy U-SQL jobs from a local machine to the cloud environment. We also discussed the importance of AU allocation and how to analyze job execution metrics. Additionally, we looked at the process of downloading executed jobs and adding them to version control.

By mastering the execution process, developers can leverage the power of Azure Data Lake Analytics to process and analyze big data efficiently.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.