Published on

April 25, 2021

Understanding Azure Storage and Data Lake Storage Gen2

Azure Storage is a powerful service that allows you to store different types of data in a durable, highly available, and scalable manner. It provides features such as data redundancy, security, and easy accessibility from anywhere in the world. In addition to Azure Storage, Azure also offers Data Lake Storage Gen2, which combines the capabilities of Azure Data Lake Storage Gen1 with Azure Blob storage.

Azure Storage Features

Azure Storage offers several key features:

  • Durable and highly available: Azure Storage replicates data within the primary region and across multiple regions to ensure data durability and high availability.
  • Secure: Data in Azure Storage is protected by encryption policies and strict access control methods to prevent unauthorized access.
  • Scalable: Azure Storage is designed to handle massive amounts of data without any performance impact.
  • Managed: Azure takes care of hardware maintenance, updates, and critical issues, allowing you to focus on managing your data.
  • Accessible: Data in Azure Storage can be accessed from anywhere in the world over HTTP or HTTPS with the appropriate permissions.

Data Redundancy Models

Data redundancy is crucial for maintaining high availability and durability of stored data. Azure Storage offers four different redundancy models:

  • Locally redundant storage (LRS): Data is replicated three times within a single data center in the primary region.
  • Zone-redundant storage (ZRS): Data is replicated synchronously across three Azure availability zones in the primary region.
  • Geo-redundant storage (GRS): Data is copied synchronously within a single physical location in the primary region and asynchronously to a single physical location in the secondary region.
  • Geo-zone-redundant storage (GZRS): Data is copied synchronously across three Azure availability zones in the primary region and asynchronously to a single physical location in the secondary region.

Types of Azure Storage

Azure Storage includes different data services:

  • Azure Blobs: Stores massive amounts of unstructured data and is accessible via HTTP/HTTPS.
  • Azure Files: Provides fully managed file shares in the cloud using SMB and NFS protocols.
  • Azure Queues: Stores large numbers of messages and is accessible using HTTP/HTTPS.
  • Azure Tables: Stores non-relational structured data in a key/attribute store with a schemaless design.
  • Azure Disks: Provides block-level storage volumes for Azure Virtual Machines.

Azure Blob Storage

Azure Blob Storage is used to store different types of unstructured data. It has various applications, such as serving documents or images to web browsers, streaming video and audio files, and storing data for analytics, backup, and archival purposes.

To create a Blob Storage, you first need to create a storage account. Inside the storage account, you can create one or more containers to organize your blobs. Each blob is accessible via a unique address that includes the storage account name and the blob endpoint.

There are three types of blobs that can be created:

  • Block Blobs: Store text and binary data and can be composed of multiple blocks.
  • Append Blobs: Similar to block blobs but optimized for append operations.
  • Page Blobs: Optimized for random read and write operations and commonly used as disks for Azure virtual machines.

Azure Blob Storage also offers different access tiers, including Hot, Cool, and Archive, allowing you to store blob data in the most cost-effective manner based on its usage patterns.

Data Lake Storage Gen2

Data Lake Storage Gen2 combines the capabilities of Azure Data Lake Storage Gen1 with Azure Blob storage. It provides file system semantics, file-level security, and scalability inherited from Gen1, while leveraging Blob storage for low cost, tiered access, high availability, and durability.

Data Lake Storage Gen2 uses a hierarchical namespace, which allows objects to be organized into a hierarchy of directories and subdirectories, similar to a file system. This hierarchical structure improves performance and makes it easier to manage and access data.

Key features of Data Lake Storage Gen2 include:

  • Performance: Data can be analyzed without the need to move or transform it, resulting in improved job performance.
  • Easier Management: Files can be organized through directories and subdirectories.
  • Security: Supports ACL and POSIX permissions for fine-grained access control.
  • Cost effectiveness: Built on top of Blob Storage, resulting in lower storage and transaction costs.
  • Optimized driver: Allows applications and frameworks to access data in Azure Blob Storage using the Azure Blob File System driver without writing extra code.
  • Hadoop compatible access: Compatible with Apache Hadoop environments like Azure HDInsight, Azure Databricks, and Azure Synapse Analytics.
  • Scalability: Can handle multiple petabytes of data with high throughput.
  • Multiple usage: Can be used as both Blob Storage and Data Lake Storage.

Conclusion

Azure Storage and Data Lake Storage Gen2 are powerful services that provide scalable, durable, and highly available storage solutions for various types of data. Understanding their features and capabilities can help you make informed decisions when it comes to storing and managing your data in the cloud.

Thank you for reading!

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.