Spotlight on : Azure Compute Services

A core feature of any public cloud provider is compute. From heavyweight offerings like virtual machines through to serverless tech in the form of functions. Azure is no exception. In fact, the list of services for compute in Azure is extensive. Here we will explore those services, describing them, giving reasons why you might use a given solution and some use cases.

Virtual Machines

A great place to start is Virtual Machines. VMs have been the staple of on prem and co-lo solutions for many years. It was only natural that public clouds, Azure included would offer this. Falling under the category of infrastructure as a service (IaaS), a VM gives the administrator OS access (not the host OS, just the guest), meaning fine-grained environment settings can be made. However, with power comes responsibility – the administrator will be responsible for updating the OS and ensuring security settings are adequate.

You can run a variety of different Linux or Windows VMs from the Azure marketplace or using your own custom images. For those who have larger workloads, need to build in resiliency or have varying levels of traffic, Azure offers Virtual Machine Scale Sets (VMSS) which can be configured to create more VM instances based on metrics such as CPU use.

Choosing VMs as your compute solution could be driven by a desire to make the quickest migration from on prem to cloud (known as lift and shift), it could also be because of some legacy application that would be complex to move into a more cloud native solution.

Azure App Service

Azure app service allows you to host http(s) based services (websites and APIs) without the complexity of maintaining VMs or having to run docker images. This is a platform as a service (PaaS) that supports multiple programming languages including PHP, Java, Node.js and .net. With this service, you can run on Windows or Linux but all the OS is abstracted away and all you are left with is an environment to host and run your applications. There’s cool features like continuous integration and deployment with Azure DevOps, GitHub, BitBucket, Docker Hub or Azure Container Registry.

Organisations would use Azure app service to deliver website front ends, mobile backends or provide RESTful APIs for integration with other systems. By supporting multiple programming languages, has integration with Visual Studio and VS Code, and can work in DevOps pipelines, makes Azure app service popular and familiar for developers to use. Patching is done automatically and you can scale your application automatically to meet changing demands.

Azure Functions

Next up, we have Azure functions. Functions are what is known as a serverless application, in which code is run when a trigger such as a HTTP request, message queue, timer or event initiates it. This is perfect for short lived sporadic workloads because you are only charged during execution of your code on the back of trigger and not charged when the function is not running.

Azure functions supports multiple programming languages including Java, JavaScript, Python and C#. Generally these are stateless, meaning they hold no information from one trigger to the next, however there are durable functions which can retain information for future processing. Durable functions are an extension of Azure functions and have practical uses in various application patterns such as function chaining and fan out / fan in.

Functions generally play well for short-lived tasks. There is no need to stand up other compute services, only to have it sit idle for the majority of the time. Use cases include orchestrating some automation in a solution or initiating some data processing when new data becomes available. Generally, they form part of a larger, loosely coupled architecture, where services are independent allowing for modular development of a single part without affecting other parts. This enables resilience in the design, such as incorporating message queues, so if one part of the system becomes temporarily unavailable, the app as a whole can continue to run.

Azure Kubernetes Service

Kubernetes is a container orchestration system, originally designed by Google, it is now an open source solution that has become the de facto way to deploy, manage and scale containerised applications in the form of Kubernetes clusters. Kubernetes (also known as k8s) is an ideal solution for running apps using the microservices model.

Azure Kubernetes service (AKS), is Azure’s offering of Kubernetes in the cloud. By using AKS, it strips a lot of the control plane away and allows you to focus on deploying your application quickly. Being an Azure service, it plays nicely with other areas of Azure such as identity, networking, monitoring and more.

Almost any type of application could be run in containers, and when there’s a need to manage and monitor multiple containers for scale, security and load balancing, AKS provides these advantages. Essentially it is provisioning containerised microservices in a more sophisticated way than deploying and managing containers individually.

Azure Container Apps

For other container workloads that do not require all the features of Kubernetes, there is Azure Container Apps. This solution is a fully auto scaling solution, including scale down to zero instances, which makes this a serverless offering where required. You can auto scale based on HTTP traffic, CPU, memory load or event-based processing such as on-demand, scheduled, or event-driven jobs.

Under the hood, Azure Container Apps is powered by AKS but it is simplified so that deployment and management are a lot easier. When deploying your container, Azure Container Apps can create the TLS certificate, meaning you can use your application securely from the outset with no additional configuration. Dapr (Distributed Application Runtime) is also included in the service, allowing for ease of managing application states, secrets, inter-service invocation and more . With Azure Container Services, you can deploy in your Virtual network, giving you many options regarding routing, DNS and security.

Azure Container Apps are great when working on a multi-tiered project, such as a web front-end, a back-end, and a database tier in a microservices architecture.

Azure Container Instances

Another service for containers in Azure is Azure Container Instances (ACI). This service is much more basic than Azure Kubernetes Service or Azure Container Apps. Creating an instance is as simple and allows Docker containers in a managed, serverless cloud environment without the need to set up VMs, clusters, or orchestrators.

If using Linux containers, you can create a container group which allows multiple containers to sit on the same host machine. By adding to a group, the containers share lifecycle, resources, local network, and storage volumes. This is similar to a pod concept in Kubernetes.

Because ACI allows containers to be run in isolation, this suits batch or automation/middleware workloads which are not tightly coupled to other parts of the system. The removal of orchestration features makes it easy for anyone to quickly use containers in their projects. Ideal use cases could be running app build tasks or for doing some data transformation work.

Azure Batch

If you have a workload that needs a lot of compute for a limited amount of time, Azure Batch is a great service that is simple to use and understand. You create an Azure Batch account and create one or more pools of VMs. The VM selection is vast and includes the GPU backed SKUs, ideal for rendering and AI tasks.

From there, you create a job in which one or more tasks are created. When the job is run, the VMs work in parallel to accelerate the processing of the task(s). There are manual and auto scaling features to ensure you have sufficient compute power to complete the job in the required timeframe. Azure Batch supports the use of spot instances, which are excess capacity in Azure datacentres, sold at a fraction of the cost, with the proviso they can remove without notice if they need the resource back, which is ideal for VMs you only need to spin up when a job is being run.

Use cases for Azure batch would include data processing on huge datasets, rendering workloads, AI/ML or scientific work that require large-scale parallel and high-performance computing (HPC), which would ordinarily require organisations to have mainframe computing on premises.

Azure Service Fabric

If you are building cloud native projects that go beyond just containers, then Azure Service Fabric is a good contender to consider. It is a distributed systems platform that aims to simplify the development, deployment and ongoing management of scalable and reliable applications.

Service Fabric supports stateless and stateful microservices, so there is potential to run containerised stateful services in any language. It powers several Microsoft services including Azure SQL database, Azure Cosmos DB and Dynamics 365. As Microsoft’s container orchestrator, Service Fabric can deploy and manage microservices across a cluster of machines. It can do this rapidly, allowing for high density rollout of thousands of applications or containers per VM.

You are able to deploy Service Fabric clusters on Windows Server or Linux on Azure and other public clouds. The development environment in the Service Fabric SDK mirrors the production environment. Service Fabric integrates with popular CI/CD tools like Azure Pipelines, Jenkins and Octopus Deploy. Application lifestyle management is supported to work through the various stages of development, deployment, monitoring, management and decommissioning.

Azure Spring Apps

The Spring framework is a way of deploying Java applications in a portable environment with security and connectivity handled by the framework so deployment is quicker and simpler. With Spring, you have a solid foundation for your Java applications, providing essential building blocks like dependency injection, aspect-oriented programming (AOP), and transaction management.

Azure Spring apps provides a fully managed service, allowing you to focus on your app, whilst Azure handles the underlying infrastructure. It allows you to build apps of all types including web apps, microservices, event-driven, serverless or batch. Azure Spring apps allows your apps to adapt to changing user patterns, with auto and manual scaling of instances.

Azure Spring Apps supports both Java Spring Boot and ASP.NET Core Steeltoe apps. Being built on Azure, you can integrate easily with other Azure services such as databases, storage and monitoring. There is an Enterprise offering which supports VMware Tanzu components, baked with an SLA.

Azure Red Hat OpenShift

Red Hat OpenShift is an Enterprise ready Kubernetes offering, enhancing Kubernetes clusters in a platform that provides tools to help develop, deploy and manage your application.

Built on Red Hat Enterprise Linux and combined with the security and stability of Azure, it offers enhancements in areas like source control integration, networking features, security, a rich set of APIs and having hybrid cloud at the heart of its design.

Red Hat OpenShift is versatile and can support a number of use cases including web services, APIs, edge computing, data intensive apps, and legacy application modernisation. Being built for Enterprise use, there are a large number of Fortune 500 companies use Red Hat OpenShift – a testament to the value proposition it brings.

Conclusion

As we can see, there are a whole array of compute offerings within Azure. Deciding on which to use will depend on use case, cost and how the application will interact with other services or the outside world.

Sometimes the a particular compute service maybe ideal for dev/test but a different service when the app is in production. In other cases, multiple compute types maybe used for larger, more complex projects.

Take time to consider which services would satisfy your requirements, then weigh up the merits and challenges of each of them before making a decision.

Exploring Certifications: Microsoft Azure Solutions Architect Expert

Widely accepted as the pinnacle of Azure Certifications, many choose to aim for the Azure solutions architect certification after completing several fundamentals and associate level certifications in the Azure space. It is an expert level certification and covers the architecture design of cloud computing in Azure. Whilst one of the required exams is relates to administration, the principal of an architect is in the planning of the infrastructure and choosing best services for a given workload, factoring in customer and regulatory requirements.

Let’s look at this certification in more detail.

Who is this certification for?

Being an expert level certification, it would assume some knowledge and experience in IT already, and more specifically in the Azure cloud environment. You could be gaining knowledge through learning, through practice or a combination of the two. Someone who is in an Azure administrator or helpdesk role may consider this certification to move up into becoming a cloud architect. A cloud architect, specifically an Azure cloud architect will help organisations transitioning to the cloud or improve existing cloud assets by rearchitecting into cloud native solutions, potentially adding ability to scale and/or design redundancy into their applications.

Considering a career in cloud architecture

Internally at Microsoft, there are Azure customer success managers (CSMs), who can move into a Azure cloud solution architect (CSA) role, and obtaining this certification is highly advantageous for the CSA position, or potentially join the company as a CSA if Microsoft are recruiting externally. Azure has many partners and end user customers, many of them who will be recruiting for cloud architects.

Exam requirements

Previously, this certification was achieved by passing two exams one regarding the technology and one regarding the design – AZ-303 and AZ-304 were the last iterations of this format. Now, we find there are still two exams to pass but one is a certification in itself and it is likely already in many Azure professional’s portfolio, the AZ-104, Microsoft Azure Administrator. The other exam is the AZ-305, Microsoft Azure Architect Design. You can take the exams in either order but the Microsoft Certified: Azure Solutions Architect Expert certification is not awarded until both exams have been passed.

The certification is valid for 1 year and you can revalidate your certification to extend year on year by passing an assessment. You can take the assessment 180 days before expiry right up to the expiry date. You don’t have to renew the Azure administrator certification to keep the Azure architect certification, but it would be nice to think you would renew all the certifications as they become eligible to do so.

In a previous blog post, we have gone through the AZ-104 exam and related certification, so in this post we will cover the AZ-305.

Topics covered

If we follow along Microsoft’s own learning path material, starting with a perquisite set of modules they provide, which includes core architectural components of Azure, describing compute, networking and storage services. There is a module on identity, access and security and another on the Microsoft cloud adoption framework for Azure. The prerequisites modules conclude with an introduction to the Microsoft Azure well-architected framework. Depending on your experience and how recently you have covered these areas will determine if you want to work through these or not. Now, let’s continue with the actual modules that are part of the AZ-305 and should cover the skills measured.

Role-based access control is a central feature of identity and governance

The first learning path is titled design identity, governance, and monitor solutions. Most of this should be familiar to those who have already completed the Azure administrator certification. The first module in this learning path is design governance, which deals with the management group > subscription > resource group hierarchy as well as tags, policies, role-based access control (RBAC) and landing zones. This is followed by design authentication and authorization solutions, which is very Entra ID heavy, including business-to-business (b2b), business-to-consumer (b2c), conditional access, identity protection, access reviews, service principals and managed identities. There is also a section on Azure key vault. The last module in this learning path is design a solution to log and monitor Azure resources, which covers Azure monitor, log analytics workspace and Azure Data Explorer.

Next learning path in the series is the design business continuity solutions, which covers describe high availability (HA) and disaster recovery (DR) strategies module, which includes HA and DR for PaaS and IaaS resources, Recovery Time Objective (RTO), Recovery Point Objective (RPO) considerations, and what to plan for in hybrid (cloud and on prem) scenarios. The other module in this learning path is design a solution for backup and disaster recovery which focuses on Azure backup, specifically for Azure blob, Azure files, Azure virtual machine, Azure SQL backup and recovery. Lastly for this module, designing for Azure site recovery is included.

The third AZ-305 learning path is design data storage solutions which begins with a module on designing a data storage solution for non-relational data. This will be all things storage accounts and specifically blob storage and Azure files. Also covered are Azure managed disks, data redundancy and storage security. The next module is not surprisingly design a data storage solution for relational data, covering Azure SQL database, Azure SQL managed instance, SQL Server on Azure virtual machines and Azure SQL edge. Items you are asked to consider include database scalability, availability and security for data in rest, in transit and in use. To conclude the module, we have table storage and the Cosmos DB Table API. The third and final storage solutions module is design data integration where the candidate will be asked to consider solutions that involve Azure data factory, Azure data lake, Azure databricks, Azure synapse analytics and Azure stream analytics. An important part of this data integration section is designing strategies for hot, warm, and cold data paths.

Azure Migrate is a suite of tools to aid cloud onboarding

The largest section in the skills measured, some 30-35% of the exam score is designing infrastructure solutions and so we will go through what is required in this subject area now. The first module is design an Azure compute solution and covers a large number of Azure compute services including virtual machines, Azure batch, Azure app service, Azure container instances (ACI), Azure Kubernetes service (AKS), Azure functions and Azure logic apps. Choosing the right compute service is a key part of cloud architecture so it is important to have these down pat. Next is design an application architecture, which mostly covers Azure event and messaging solutions, namely Azure queue storage, Azure service bus, Azure event hubs, and Azure event grid. There is a section on designing an automated app deployment solution using ARM templates or BICEP. Also covered in the apps section is Azure Cache for Redis, Azure API management and Azure app configuration. The number of components mentioned in the design network solutions learning path is considerable. It begins with general networking considerations, thinking about IP addressing, selecting a region, and choosing a topology; hub-and-spoke is the most popular so expect this to be featured in the exam. Azure virtual network NAT and route tables (system and user defined routes (UDR) are included also. The section in the module on on-premises connectivity to Azure virtual networks expects a knowledge of when to use Azure VPN Gateway or Azure ExpressRoute (with optional VPN failover) and when Azure virtual WAN maybe appropriate. Staying with networking, a section is dedicated to application delivery services, which mainly deals with load balancing solutions, namely Azure Front Door, Azure Traffic Manager, Azure Load Balancer and Azure Application Gateway. You are expected to know when to use a given solution depending on regional or global requirement, working on OSI layer 4 or 7 and if the workload is internal or public facing. Also you should know when to use the Azure Content Delivery Network (CDN). Then to wrap up networking there’s the section on designing application protection services which again contains a lot of services including Azure DDoS Protection, Azure Private Link, Azure Web Application Firewall, Azure Firewall, virtual network security groups (NSGs), Service endpoints, Azure Bastion and JIT network access. Design migrations is the final module of the infrastructure learning path. It begins with understanding the Azure migration framework as part of the wider Cloud Adoption Framework. This module then develops into leveraging tools that assist with the migration journey, including Service Map, Azure Total Cost of Ownership (TCO) Calculator, Azure Migrate, Data Migration Assistant (DMA), Database Migration Service, Azure Cosmos DB Data Migration tool and Azure Resource Mover. The migration section concludes with the various methods to get data in and out of Azure. Azure Storage Migration Service, Azure File Sync, Azure Import/Export service, AzCopy, Azure Storage Explorer and Azure Data Box are are services that are used to migrate your data. That is a lot but remember, this is design, so you won’t be going into these services in any great detail, only knowing when to use a solution for a given scenario.

The penultimate learning path for the AZ-305 exam is build great solutions with the Microsoft Azure Well-Architected Framework. This is an established process to follow to give a project in the cloud a great chance of success. The Microsoft Azure Well-Architected Framework consists of five pillars:

  • Cost optimization
  • Operational excellence
  • Performance efficiency
  • Reliability
  • Security

Each of these pillars will be understood by the candidate to ensure the opportunity to architect a solution has these important factors taken into account. To help with learning, each pillar has it’s own module within the learning path.

Considering SQL DB as a service instead of SQL on VMs

The final learning path is accelerate cloud adoption with the Microsoft Cloud Adoption Framework for Azure. The concept here is to understand the goals, evaluate the project from an IT, financial and operational perspective and bring along stakeholders to champion the cloud adoption through it’s various stages. There is a whole module on using Azure landing zones to support your requirements for cloud operations as well as other modules on migration best practice, building in resilience and designing with security in mind. As part of the adoption journey, there needs to be consideration regarding minimum viable product and measuring project effectiveness and what success looks like.

Exam hints and tips

The first advice whether seasoned in Azure or not would be to complete the fundamentals and administrator certs before attempting this exam. There is a fair bit of crossover and keeping the broad topics fresh is a good way to build up to the more complex concepts. Also, if possible, try not to leave too much of a gap between taking them. Keeping the momentum going is a good way of not forgetting things already learned.

In many Azure certifications, it is often recommended to have hands on practice with the different types of resources as well as learning the theory. The design infrastructure solutions exam however is just that, design. The implementation comes in the administrator exam so this one is much more high level and plays to describing best practice solutions, not the nuts and bolts of creating a resource and so forth. In a way, this exam has a lot in common with the Azure fundamentals exam – although of course it is markedly more difficult.

Following some hints and tips from others can help

Life is busy and this is a big exam and a big deal for your career and professional recognition. As such, if you can, reserve more time for study just before exam date, so you can have a bulk of recently stored knowledge to walk into the exam with. Make provisions with home and/or work to have more time to give yourself a last push, but keep it balanced. After several hours a day, it will become counter-productive to try to endure even more learning. Also don’t cram on the day of the test. By then the adrenaline will be blocking the ability to properly concentrate. My advice is also, don’t book the exam for the evening unless you are generally asleep in the daytime. These exams are long and take stamina. Early to mid-afternoon works well for me.

It’s always worth booking the exam before you are fully ready, to try and set a learning pace. If it gets close to the date and you feel are still miles off, you can reschedule (or even cancel for a refund), so long as it is more than 24 hours before the exam start time. A lot of these exams is down to confidence, if you aren’t sure if you’ll pass or not, give it a go anyway. If you don’t pass as least you will have some understanding on how far off the pass mark you are and what troubled you the most, so you can pass on the next attempt. I have often practiced with a real exam in this way, sometimes I pass to my surprise, sometimes not, and that is ok also.

There is more exam advice, much of which applies to this certification as well on the Azure administrator and Azure fundamentals posts.

Recommended resources

This section is going to seem like a stuck record if you have read the AZ-900 and AZ-104 posts, but it has to be said, regardless of what 3rd party resources you decide to assist with your learning, you should consume the official Microsoft AZ-305 exam learning paths. It is curated to cover all aspects of the skills measured, so if its not on this content, its unlikely to be on the exam. There are some exercises in the prerequisite modules but the rest of the learning path is information only (being a design, not administrator exam, that makes sense, right?)

John Savill must be mentioned again. As discussed in previous posts, John’s YouTube content equals or surpasses much of the commercially available courses out there. Not only John gives up his free time to produce this huge body of work, he refuses to monetise his YouTube channel, so you don’t even see ads! For this exam John provides an entire playlist of videos relevant to the exam including his hugely popular AZ-305 study cram.

John Savill’s AZ-305 is essential viewing before taking the exam

Beyond those two free resources, there is plenty of other free material online as well as many popular websites such as Pluralsight, Udemy, LinkedIn Learning and Cloud Academy offering a dedicated AZ-305 course. I haven’t reviewed any of these so cannot comment on their quality, so check out what is on offer with any paid subscriptions you already have or ask others who have recently certified what courses they used.

Next steps

Once you have achieved the Microsoft Certified: Azure solutions architect expert certification, you really do have so many options on what to choose next, we could almost list every Azure certification here. What you do next in terms of certification will depend a lot on your strengths, your interests and perhaps some influencing factors such as encouragement by your current employer to follow a certain path that is compatible with a skill shortage they have identified. Or perhaps you have been reading articles in the IT industry press about an overall shortage of skilled people in a certain IT category and you think a good career move would be to be qualified in that area of expertise.

Now you have one expert level certification, there are a couple of others in the Azure space – DevOps engineer expert & Cybersecurity architect expert, both of which require a couple of exams to get the qualification, but in some cases, you may already have one of these when working towards other goals. For example, the AZ-104 is one of the two exams required for the Azure solutions architect expert exam, but it also can be used along with the AZ-400 to obtain the DevOps engineer expert certification.

There are plenty of associate level certifications in all sorts of areas of Azure cloud such as data engineering, networking, security, AI, Developer and so on. There are also speciality certifications in subjects such as Cosmos DB, Azure virtual desktop and Azure for SAP workloads.

Spotlight on : Azure Storage Accounts

What is an Azure Storage Account?

An Azure storage account creates a globally unique namespace that contains the base parameters for what the storage account will be used for and where it is located. Typically the storage account contents will match some performance, locational, security or billing requirement or a combination of these, therefore an Azure subscription may have more than one storage account if requirements differ. Storage can be used for any application you can think of for a vast array of structured, semi-structured and unstructured data.

What are the main things you need to consider when creating a storage account.

Firstly, you need to pick a storage account name. It has to be unique across all of Azure and be between 3 and 23 characters long, lowercase and numbers only. The name will form part of the URI for accessing the storage account.

Next you need to consider then pick your region: the geographic location of the datacentres that will be hosting your data. Generally this will be down to where the closest region is to your customers or users of the data but other factors may be considered such as cost. There is ability do deploy to municipal areas dotted around the world called Edge zones but this is isn’t as popular.

Another important choice is performance. Choosing standard will give you access to general purpose v2. This is the most popular option and supports blob storage (including data lake storage), queue storage, table storage and Azure file storage. Choosing premium gives you premium block blobs, page blobs and file share storage. Premium storage is higher has higher throughput and/or lower latency but you cannot have a mix of storage types such as block blobs and file shares in the same storage account.

There are legacy storage accounts: General purpose v1 and blob storage but these are not considered best practice for the vast majority of cases are deprecated with a retirement date of 31st August2024.

Architect your storage to ensure data is as close to your users as possible

The last major decision is the redundancy level you require which are as follows:

Locally-redundant Storage (LRS) stores 3 copies of the data in a single datacentre in the selected region. It is the lowest option and provides at least 99.999999999% (11 9’s) durability and at least 99.9% availability for read and write requests using the hot tier (access tiers covered later).

Zone-redundant Storage (ZRS) distributes 3 copies of the data into different data centres in the same region (availability zones). Each zone will have independent power, networking and cooling but still be close enough together to have really low latency. The target distance apart is around 300 miles where possible. This option provides at least 99.9999999999% (12 9’s) durability and at least 99.9% availability for read and write requests using the hot tier*.

Geo-redundant storage (GRS) stores 3 copies of the data in a data centre in one region, like LRS, but then asynchronously copies to another region (the paired region, most regions have one) and have 3 copies in a data centre there too. GRS offers durability for storage resources of at least 99.99999999999999% (16 9’s) over a given year. Read and write availability is at least 99.9%*.

Read-access-geo-redundant (RA-GRS) is the same replication method as GRS but you can read the data from the secondary region. This could have advantages from bringing the data close to the user scenario and means data can be accessed without having to initiate a failover, as would be the case for plain GRS (or GZRS). By adding read access to the secondary location, this increases read availability to at least 99.99%*.

Geo-zone-redundant storage (GZRS) as the same suggests is a combination of ZRS and GRS, where there are 3 copies of the data spread over availability zones in the primary region and 3 copies in a single data centre at the secondary location (LRS arrangement). The durability for this method is also at least 99.99999999999999% (16 9’s). Read and write availability is at least 99.9%*.

Read-access geo-zone-redundant storage (RA-GZRS) enhances GZRS by allowing read access at the secondary location without a failover. By adding read access to the secondary location, this increases read availability to at least 99.99%*.

For premium storage, only LRS and ZRS options are available due to the performance requirement. Azure managed disks and Azure elastic SAN also are limited to LRS and ZRS.

* 99% for cool tier.

Block blobs, append blobs, page blobs, queue storage, table storage and Azure file storage. What are the different storage types and when might I use them?

Blobs are stored in a security boundary called a container, where you can set access permissions. There are 3 types of blob: block, append and page. They can be accessed by users and client apps via http or https. There are multiple methods for access including via the Azure Storage REST API, Azure PowerShell, Azure CLI, or an Azure Storage client library which are available for a number of popular programming languages. Blob storage is a flat filesystem but can be made to emulate a folder structure by prefixing the filename with a would-be folder followed by “/” for example images/hello.jpg.

Block blobs are a popular storage type in any cloud provider. They are ideal for storing any unstructured files such as images, text, and video. Typically these will be used to host content for a website or app or streaming audio and video. Blobs can be as big are 190.7TiB.

Next is append blobs which are optimised for appends to files as opposed to random access writes – useful for things like logging. The maximum size for an append blob is slightly more than 195 GiB.

Page blobs store virtual hard drive (VHD) for Azure virtual machine disks and can be up to 8 TiB in size. Although this was once the way for WM disks, this method (unmanaged disks), is being deprecated, with managed disks being a simpler to setup option where the storage account is abstracted away and size and performance are easier to adjust.

Queue storage is used for storing messages that are up to 64Kb in size. The queue can run into millions or messages, the maximum governed by the entire storage account storage limit. Queues help decouple parts of of an application which is ideal for architecting in the cloud. For example, messages are created by one part of an application, perhaps a .net app using the queue storage client library, which can then be processed by another system component in the application architecture, such as Azure functions. For messages over 64Kb, ordering guarantee (FIFO) and other more advanced functionality, then consider Azure Service Bus queues.

Lastly there is table storage. This is used to store tabular data, which is semi-structured, following no fixed schema. Often this data scenario is now stored in Cosmos DB which allows for more performance and scale, but there may be cases where table storage in a storage account is preferable, such as cost.

What is Azure Data Lake Storage Gen2?

Azure data lake storage Gen2 (ADLS Gen2) is built on blob storage but selecting the hierarchical namespace when creating the storage account unlocks the features of ADLS Gen2.

You can turn data lake on when creating your storage account, using either general purpose v2 or premium block blob. Being a blob storage, you can store files of any type whether structured, semi-structured or unstructured. Although, a data lake is used a lot in data analytics which would generally be structured or semi structured.

Azure data lake
Azure Data Lake Storage Gen2 is ideal for data analysis workloads

One of the biggest differences between blob and data lake is the use of a directory structure, useful in ELT (extract, load & transform) data pipelines for big data analytics. By having folders, you can bring, in addition to RBAC, POSIX style Access Control Lists (ACLs). Additionally Microsoft Purview can create security policies for files and folders, based on classifications.

As well as the Azure Blob API, developers and data engineers can use the ABFS driver for Hadoop that communicates with the DFS API to work with a hierarchical directory structure.

ADLS Gen2 being built on blob offers most of the benefits of a standard or premium blob service such as access tiers, redundancy levels, NFS & SFTP. There are some limitations when using ADLS Gen2 over blob so check the latest official documentation for what they are currently.

ADLS Gen 1 is to be retired on the 29th February 2024, so was not described here for that reason.

What security considerations does an Administrator need to factor in when managing a storage account?

We need to secure access to the data with credentials, whether access is via a user or an application. There is an overarching authorisation method which is a storage account key. These are the “keys to the kingdom” which will give access to everything in the storage account and so are not considered best practice. Each account has two storage account keys that can be rotated manually or automatically if the keys are integrated with Azure key vault. The idea of having 2 keys is your application can switch to the 2nd key whilst you rotate the first. An evolution of this, and considered a good solution, especially when there are third party users or apps involved is shared access signatures (SAS). Shared access signatures allow you to build a URI that gives granular and time limited access to account and/or data plane functions. Once a signature has been created you cannot revoke it until it expires but you can rotate the storage account key which was used to generate it (key 1 or key 2). A generally best practice authorisation method when you have users or managed identities in your tenant that require account, container or data rights, is Access control (IAM) which works under Azure role-based access control (RBAC), prevalent throughout all Azure services. With RBAC, you can assign users, groups or managed identities to an Azure created or custom role. This allows fine grained access to control and data plane features.

Azure storage shared access signatures (SAS) options

Moving onto networking considerations, starting with encryption in transit. When creating a new storage account, the setting, require secure transfer, is on as default and should be left that way. This setting disables http access to the data via the API and only allows encrypted connections to SMB 2.1 or SMB 2.3. Unless there is a specific use case, it is recommended to keep this setting on. Deactivating will allow http access as well as https. Note: currently, using a custom domain will not allow you to use secure transfer so the setting will not have any affect when using a custom domain. Every storage account has a firewall. There are three settings, disabled, which blocks all traffic and so the only way to access the data is via a private endpoint and there is enabled from all networks, which is great if the storage account is used for hosting website assets, images for a product catalogue for example. The third option Enabled from selected virtual networks and IP addresses does just that; allows the administrator to specify which virtual networks can access the storage account, also specify access via IP address or CIDR range, and finally by Azure service type and scope such as “Microsoft.EventGrid/Topics in this resource group”. If specifying a virtual network, you must enable the service endpoint on the virtual network(s) you specify, applying the Microsoft.Storage for regional storage accounts or Microsoft.Storage.Global for cross-region storage accounts. Some options you can also consider on the storage account firewall. There are some some firewall exceptions you can turn on, such as “allow read access to storage logging from any network”. Also, you can specify Microsoft routing or Internet routing, will determine where the traffic traverses to connect the client and storage account. A really good solution for storage account security is to select disabled on the firewall settings and then set up a private endpoint. A private endpoint works over Azure private link, where the storage service traffic travels on the Microsoft backbone and has a special network interface in a subnet you select, which is assigned an IP address from your subnet’s CIDR range. Once the private endpoint has been established in your chosen subnet, you can setup connectivity beyond that subnet or VNet with networking features such as VNet peering, custom routes and on prem connectivity via VPN or ExpressRoute.

Lastly in this section we have Encryption. We’ve already discussed encryption in transit in the networking section able. Azure storage also provides encryption at rest with a technology called Storage Service Encryption (SSE). This is on all storage accounts and cannot be deactivated. All data is stored using 256-bit AES encryption and is FIPS 140-2 compliant. Data is stored using Microsoft managed keys but an administrator can also use customer managed keys which must be stored in Azure Key Vault (AKV)or Azure Key Vault Managed Hardware Security Model (HSM) or customer provided keys which will allow the client to provide the key on request to the blob storage – keys can be stored in the Azure key vaults (standard or HSM) or another key store. As well as whole accounts you can specify encryption scopes for blob storage at the container or for an individual blob. Another encryption options available is infrastructure encryption, for double encryption of data (infrastructure and service).

What backup and recovery options are available?

To restore files accidentally or maliciously deleted, you can turn on soft delete. By setting a retention period for containers from 1 to 365 days. If a file gets deleted, you have until the retention period to undelete the file.

To automatically retain copies of a file before the latest amendment, you can turn on blob versioning. To restore a previous version, you select the version you wish to restore to recover your data if it’s modified or deleted. Blob versioning is often used alongside soft delete to form an overall data protection strategy. Storing lots of versions carries a cost consideration and can create latency when running a list blob command, for this it is recommended to change access tiers or delete older versions using lifecycle management which is covered later in this document.

You can also take a snapshot of a blob. This is a manual process that appends the blob URI with a DateTime value. Snapshots exist until they are independently deleted or as part of a delete blob operation. Versioning maybe best practice but at the time of writing, versioning doesn’t work with ADLS Gen2, whereas snapshots do (in preview).

If you are using geo redundant storage (GRS / RA-GRS / GZRS / RA-GZRS) and the primary region suffers a failure, you can initiate a customer managed failover where upon completion the secondary region becomes the primary and the old primary, the secondary. If Microsoft detects a failure at the primary region deemed to be severe enough, the system may initiate a failover automatically without customer intervention. This is called a Microsoft-managed failover.

For business-critical data that must not be changed or deleted after being created, you can apply a WORM (Write Once, Read Many) state called immutable storage. Set at account, container or blob version level, you can set a time-based retention policy. Blobs can be created and read, but not modified or deleted. After the retention period has expired, objects can be deleted but not overwritten. There is also a legal hold type of immutable storage, which can be set at blob level. Legal hold prevents deleting or modifying an object until the legal hold is explicitly cleared. If any container has version-level immutability enabled, the storage account is protected from deletion, likewise a container cannot be deleted whilst a blob has an immutable storage policy active.

If you enable soft delete, change feed and blob versioning on your storage account, you can take advantage of point-in-time restore for block blobs, where you set the maximum restore point in days in the storage account. From then, if you need to restore a file (or entire container, or virtual directory), you can choose a restore date and time in UTC.

To protect the storage account itself from being removed or its configuration modified, you can apply an Azure Resource Manager lock. Locks come in two types; CannotDelete lock prevents the account being deleted but the configuration settings can be changed whereas ReadOnly lock prevents changing the configuration or deleting the storage account. Both locks still allow the account configuration to be read.

NB Some of these features will not work when hierarchical namespace is enabled, so check the latest official documentation to ensure what data protection options you have if that applies to your configuration.

What services can import or export volumes of data to or from Azure storage?

There are various ways to get data in and out of Azure storage, let’s have a look at some of them. Firstly, if you are only transferring a small amount of data, you can use a command line or graphical interface tool.

For command line, you can use Azure CLI, PowerShell or the dedicated AzCopy program, which supports concurrency and parallelism, and the ability to resume copy operations when interrupted.

If you want to use a GUI, then you can upload and download files in the Azure portal, or there is Azure Storage Explorer, available on Linux, Mac and Windows.

Microsoft Azure Storage Explorer

If you are doing a massive data transfer exercise or moving data from somewhere with little or no Internet access, you may want to use the Azure Import/Export service which allows you to prep and send SATA HDD or SSDs to Microsoft to securely upload or download data. For larger data still, Azure has physical appliances it can ship to and from customers to do data transfers; Data Box Disk and Data Box Heavy which are encrypted and logistics are done via an approved carrier.

For regular transfers of big data, you may want to use a service such as Azure Data Factory where you build data pipelines between source and sink and can perform in-transit transformations as well as other features. Similarly, you can use pipelines and activities Azure Synapse Analytics. Microsoft also have Azure Data Box Gateway which is a virtual appliance you can run in your co-location datacentre or on premises. Azure Data Box Gateway transfers data to and from Azure via NFS and SMB protocols.

If you need a programmatic ability to transfer data, then the Data movement library for .net & .net core is an option or there is the Azure Storage REST API and Azure blob storage libraries for .net, Java, C++, Python, Go & JavaScript. If you have static website mode activated, you can deploy your static website assets via Terraform.

There are other tools, that can be used, including AdlCopy, Distcp, Sqoop, PolyBase, Hadoop command line, Azure Data Factory integration runtime and the new Microsoft Fabric data analytics platform.

Any best practice or cost saving tips?

A good first step is considering the amount of redundancy you need for a given workload and break the varying requirements into different storage accounts. For example, for a customer facing global website may benefit from having data in two regions and so RA-GRS or RA-GZRS maybe ideal, but if you have some internal documents primarily used in one region, LRS or ZRS may have all the redundancy required, at a lower cost.

Access tiers are designed to get maximum value for money, depending on the access requirements of your data. The four tiers are hot, cool, cold and archive. Hot tier will be the most expensive to store but cost less to access and transact with the data, whilst on the other extreme, archive tier is cheap offline storage for backup or compliance date storage, but to gain access will come with a higher cost, this data should be stored for a minimum of 180 days. In the middle, there is cool tier, used for infrequently accessed or modified data, which should be stored for a minimum of 30 days. For rarely accessed data that still needs to remain online, there is cold storage which should be stored for a minimum of 90 days. If you move anything to archive tier, note that it will take hours to move it back to hot, cool or cold tier and isn’t modifiable whilst in the archive tier. Blobs will incur an deletion penalty if they are deleted or moved to a different tier before the minimum number of days required by the tier. A storage account has a default access tier, which is inherited by all containers contained in the account but can be changed at blob level.

Developing on the access tiers concept, Azure has functionality to automate the tiers of blobs. This is called lifecycle management. This allows blobs to move to different access tiers or be deleted, depending on the rules you create. This is best practice for cost and data compliance. Purposes. By using lifecycle management, you take some of the guesswork out of when blobs are candidates to change to a more suitable tier. For example, a pdf of a manual for a product on a manufacturer’s website could be read and updated regularly so would be ideal to be in the hot tier, but as the model is discontinued and time passes, it could be accessed so infrequently that moving to cool or cold would be better suited, which can be done automatically using lifecycle management. At some point, the data maybe archived or deleted, which can also be automated. Lifecycle management policies can be applied to block and append blobs in general-purpose v2, premium block blob, and Blob Storage accounts. System containers such as $logs or $web are not affected by these policies.

Reserved capacity allows users to commit to storage in units of 100 TiB and 1 PiB (higher discounts for 1 PiB blocks), for a one-year or three-year term. Committing gives a discount that will vary depending on the length of time you reserve for, the region, the total capacity you reserve, the access tier and type of redundancy that you are using. It is worth doing the analysis on the Azure pricing calculator to illustrate the discount and commitment spend involved.


Azure pricing calculator will illustrate reserved capacity vs PAYG costs

There are options you can toggle in your storage account which carry extra charges, for example SFTP, which is charged by the hour. To reduce the cost, consider having an Azure automation runbook to toggle the service off and on depending on the hours it is required in your organisation. In this scenario, you could also consider using the Azure storage explorer application for your team and therefore would not require the cost or management overhead of using SFTP.

How can I find out more information?

There are lots of other parts of Azure storage accounts you can investigate, depending on your requirements. We could have covered them here but this post could have been a book. Notable key focus areas to consider are monitoring, Defender for cloud, static website mode, Front door & CDN, custom domain names, Resource sharing (CORS) and blob inventory.

If you want to be certified in Azure storage, then look at the DP-203 exam which, upon passing will give you the Microsoft Certified: Azure Data Engineer Associate certification. As well as Azure storage and ADLS Gen2, the certification covers Azure Data Factory, Azure Synapse Analytics, Azure Stream Analytics, Azure Event Hubs and Azure Databricks.

Copyright © 2024 azureskills.tech

Theme by Anders NorenUp ↑