Introduction

I’ve been fortunate to have obtained access to the private previews of Azure resource manager “Deployment stacks” early 2023. Now that the resource manager enhancement is general available it’s appropriate to share some of my initial experience with it. In this blog post I’ll try and explain how deployment stacks will improve working with Azure resource deployments and create a better story for managing the lifecycle of groups of resources.

Introducing: “Deployment Stacks”

Deployment stacks is a concept in Azure resource management that provides more control over the way the lifecycle of collections of Azure resources are managed. It does so by creating the ability to basically “label a set of resources” and treat these as one group/entity that must share the same lifecycle. This creates a strong bond between otherwise relatively independent resources. Additionally it provides improved control over the behavior Azure Resource Manager (ARM) displays when dealing with a resource that is changed or no longer part of the stack. Let’s take a look at a bit more detail on how this provides more control over resource deployment than with the basic ARM deployment modes we know and love/loathe/know.

Resource management concepts; Complete and incremental

Up until now there were basically two ways to deploy resource templates, the complete mode and the incremental mode for a given scope (management group, subscription and resource group). The complete mode removes all resources that were no longer part of the template and added + updated resources that are part of the deployed template for the given scope. The incremental mode did not touch resources that we no longer part of the resource template and just updated + added the resource that are part of the template and didn’t care about anything else.

The challenge with complete deployments

Complete deployments do keep the slate very clean, but they provide a lot of challenges whenever you want to change/swap a resource for another or combine resource deployments into a single resource group or subscription. It also is a big source of anxiousness because if you screw up, things will be removed unapologetically, causing all sorts of embarrassment and fixing.

Small steps

In the scenario where we aim to change out one resource for another, it basically comes down to making multiple deployment steps to achieve something resembling the incremental deployment model. Basically making the developer the Azure Resource Manager Manager to achieve the desired outcome.

For example, when an app service that currently uses a storage account for some data persistence is changed to a database. When you want to ““swap”” that storage to a cosmosdb, you’d first have to add the cosmosdb, deploy the change, then migrate the data using a script, remove the storage and deploy using the delete option to end up with the new infra.

Especially if in the delete step you’ve messed things up, you cannot easily go back because the storage resources are removed (storage account, blob service, blob container, blob data). That’s not really a safe way to operate, but it for sure is a way to keep a clean landscape in Azure ;-).

Enterprise scale

Having the ability to have multiple complete deployments target a single subscription is basically impossible. You get so much entanglement between the resources and so many resources in a single template that it defeats the purpose of having the ability to perform proper lifecycle management of these resources. Especially in situations in which you might have policies creating/managing specific resource or split responsibilities for subscription resources (like we see in using Enterprise Scale Landing zones, where different teams can manage different resource in a single subscription) this becomes problematic, if not impossible.

The challenge with incremental deployments

So, if complete mode deployments are not ideal, then how does this compare to incremental mode deployments. Incremental mode is a lot safer and friendlier to use, but still you can get into problems, especially on the lifecycle management and lingering resources that accumulate cost/waste. This happens because in incremental mode ARM doesn’t look beyond the contents of the resource manager template it receives and so some strange things can/will happen.

For example, lingering NSG rules can create “open pathways” via previously deployed rules to newly introduced resources because they were part of the “previous resource configuration” and are not removed automatically upon deploying the latest template.

There are also ways to get conflicting resource changes if a resource is part of multiple templates or scripts (which is easy to overlook). And then there is the possible creation of orphaned resources. Remember, resources that are lingering around increase attack surface and still add cost, so there is a need for manual clean-up. That requires discipline and persistence in checking changes and identifying things to clean up. Given all of that, it still is al lot safer to use than complete mode, but not ideal.

Introducing deployment stacks

To address the problem of lifecycle management of the resources, deployment stacks are introduced. By having the ability to group together resources and managing the lifecycle of these “as one”, a whole lot of the previously identified problems are tackled. This requires such a group to receive a unique label that will be tracked in ARM. Any subsequent deployments against that uniquely defined stack will cause ARM to create a diff against the current state of the tracked set of resources. That results in an increment (new snapshot) of the resource configuration.

Detach or delete

On the deployment we can set the behavior for dealing with resources that are no longer part of the stack. We can choose between detach and delete. Using detach we can opt for cleaning up things manually. Using delete we can instruct ARM to remove the resources that are no longer part of the stack. In case of the previously mentioned NSG example, that now will be cleaned up if the resource that required this no longer exists. This is a good example of the lifecycle dependencies that are created between resources that share the same stack. It also show the importance of selecting stack members to share the same lifecycle.

Locking a stack

To prevent a stack from being altered by a different deployment (or even Azure Policy!) you can set the lock down behavior to denyDelete or denyWriteAndDelete behavior for the stack. Using denyWriteAndDelete creates a readonly resource configuration that can only be updated through updating the stack itself. Fortunately to accommodate the use of policy, one can specify the policy agent objectId to a list of exempt resources to play nicely, but it’s a powerful mechanism to lock down and prevent resource changes. The beauty is that it doesn’t require to lock a large scope like an entire resource group, so other deployments can still safely take place without changing the stack.

Configuration drift prevention, but not quite DSC

Maybe good to note, this locking behavior is not the same as protecting from configuration drift. It can provide a means to limiting other actors to change the resources in the stack but there is no continuous consolidation or reporting on the state. For true detection of configuration drift, have a look as the latest preview for DSC v3.0

Use case for exemption

A nice use case for the exemption may be found in the application of centralized network security automation. Where one may opt to have the network stack to be altered by a well-known objectId representing the network security automation. This would allow an application outside the deployment scope to alter the network resources in the stack when for instance a threat is found or settings for capturing packets need to be changed.

First steps

To get up + running you’d have to install the latest bits for Azure PowerShell and Azure CLI.

Check the latest version of Azure CLI

You will need an Azure CLI version that supports the specific bits for deployment stacks. To check if you have them you can run:


az stack --help

That should give you the options for az stack and not give an error. If it does give you an error, you can download the lates version of AzureCLI and pick one with a version higher/later than 2.44.1, because prior to that version deployment stacks are not available.

CMK Storage account experiment

For my experiment I’ve created a set of bicep templates that composes a solution that defines a resource group with a storage account using a Customer Managed Key (CMK) from a key vault. The recipe for that is to create:

A Resource group
A Key Vault to store the key encryption key
A key encryption key for the storage account
A managed Identity for the storage account to access Key Vault
A Role assignment to the managed identity so it can wrap/unwrap the key
A Storage account utilizing these previously mentioned resources
A blob storage service
A blob container

Mixed scopes

Since the deployment is a subscription scoped deployment we have to separate the resources in a few layers, the resource group creation at subscription scope and successively the resource group contents are created with resource group scope and the role assignment for the key vault is created with a key vault scope.

The CMK storage bicep example is found in my GitHub repo.

Example in Git repo

git clone https://github.com/CarloKuip/DeploymentStacks.git

Deploying the stack

To deploy the stack, use Azure CLI and for example provide these parameters.

az stack sub create `
    --name myStorageWithCmk `
    --action-on-unmanage deleteResources `
    --template-file main.bicep `
    --location westeurope `
    --description "My storage with CMK deployment stack" `
    --deny-settings-mode None

Of course you can change the parameters etc. by providing a parameters file or changing the source. Feel free to experiment.

In the Azure portal, the stack shows up as a specific resource type which you can navigate to and explore it’s contents.

The complete command line reference for deployment stacks using Azure CLI can be found here: Deployment stacks using Azure CLI

A word of caution

First and foremost, deployment stacks are a very nice addition to the available tool set for creating resources in Azure. But, whilst working with them, I also encountered a few ““peculiarities”” using them.

Sequence of events

Be aware of the sequence of events a stack deployment will take when adding, changing or deleting parts of the stack and redeploying. You may end up in a situation where for example a network becomes inaccessible for a bit while an NSG rule is removed before the new one is added. These type of things you need to be aware of. Same applies to PaaS firewall settings and role assignments, there’s a whole lot more to consider.

To figure out how ARM deals with these re-deployments, make sure you test the deployment and check the deployments blade of the Azure Portal and look into the sequence details of the deployment. This will help you prevent unplanned downtime for your resources. I personally don’t think there’s a given single way to go about this so you’ll have to experiment to address the needs of your specific scenario. To understand what’s happening look at the deployment output in the portal or set the CLI output level to include the –debug level.

Further reference

Have a look at the GitHub repo for deployment stacks to read up on the details. Azure on GitHub

Some of the things I've learned

And I think are worthwhile to share.

Azure Resource manager deployment stacks in Bicep