In today’s data-driven world, companies are constantly generating and storing massive amounts of data. This data can be unstructured, semi-structured, or structured. Data professionals are always looking for efficient ways to handle and analyze this diverse data. While traditional data warehouses and analytics tools can handle this task, they often require time-consuming ETL processes on terabytes and petabytes of data before analysis can be performed.
Enter Azure Data Explorer (ADX), a fast, scalable, and fully managed data analytics service for log, telemetry, and streaming data. ADX allows users to pull together, store, and analyze diverse data quickly. It enables users to query terabytes of data in seconds and supports fast ad-hoc queries over varied data. ADX uses a SQL-like query language called Kusto query language (KQL) to analyze fast-flowing data from IoT devices, applications, websites, and more. KQL offers a wide range of functions and operators for aggregation, filtering, and even built-in machine learning features like clustering and regression.
ADX works on the principle of isolation between Compute and Storage, using volatile SSD storage as a cache and persistent storage in Azure Blob Storage. It is a fully managed Platform as a Service (PaaS) that allows users to focus solely on their data and queries. One of the key benefits of ADX is its support for time series analysis, with a vast array of functions to analyze trends and anomalies.
How does Azure Data Explorer work?
To get started with Azure Data Explorer, you need to create a cluster and one or more databases within it. Once the cluster and database are set up, you can load (ingest) data into the database and run queries for data exploration.
Creating an Azure Data Explorer cluster
Creating a cluster is a straightforward process. Simply sign in to the Azure Portal, search for “Azure Data Explorer cluster,” and click on Create. Provide basic details such as your subscription, resource group, cluster name, and pricing details. ADX clusters are billed on a per-minute basis and are charged only when they are running. You can choose between Compute-optimized instances (D series) or Storage-optimized instances (DS series) based on your workload needs.
Creating a database in the cluster
Once the cluster is created, you can proceed to create a database within it. Specify the name of the database, retention and cache period, and hit the Create button. ADX allows you to decide how long you want to keep the data in the cache, whether hot or cold.
Ingesting data into Azure Data Explorer
After the database is created, you can load data into it. ADX supports various methods of data ingestion, such as Event Hub, Blob Storage, IoT Hub, Kafka connector, and Azure Data Factory. For this example, we will demonstrate how to load sample data from Azure Blob Storage into the ADX database using the Kusto query language.
.create table CrimeData (CrimeDate: datetime, CrimeTime: datetime, Location: string, Description: string, Weapon: string, Coordination: string)
.ingest into table CrimeData 'https://azsqlshackstorage.blob.core.windows.net/mycontainer/CrimeData.csv?sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-02-21T05:38:14Z&st=2020-02-20T21:38:14Z&spr=https&sig=adoJ%2BM494Vq6lwCvL0ZjWP%2BtaZsTNMcPtnIh%2BuPBlQw%3D' with (ignoreFirstRecord=true)
Querying the database using Kusto language
Once the data is ingested, you can start querying the database using the Kusto query language. KQL is similar to SQL and is also used in Azure Log Analytics and Azure Monitor. You can run queries to retrieve specific data, perform aggregations, filter data, and even visualize the results using various chart types.
CrimeData | take 10
This query retrieves the first 10 rows from the CrimeData table.
CrimeData | where Weapon == "FIREARM" | count
This query returns the count of records where the Weapon column equals “FIREARM”.
CrimeData | where Weapon == "HANDS" and CrimeDate == "11/12/2016" | top 2 by CrimeTime desc
This query retrieves the top 2 rows for crimes that happened on 11/12/2016 with the weapon as “HANDS”.
It’s important to note that ADX does not allow deleting rows from a table like in traditional SQL databases. However, you can delete a table using the drop command.
Conclusion
Azure Data Explorer is a powerful data analytics service that allows users to efficiently handle and analyze diverse data. With its fast data ingestion, scalable architecture, and support for time series analysis, ADX is a valuable tool for data professionals. By leveraging the Kusto query language, users can easily query and explore their data, gaining valuable insights and making informed decisions.