Fast incremental Pulumi deploys with Garden │ Garden.io

Þórarinn Sigurðsson

September 29, 2022

Earlier this year, an enterprise client asked us to create a Pulumi plugin. Why?

Their project consists of 30+ Pulumi stacks (each of which is deployed separately) meant repeated, parallel calls to the Pulumi CLI, which was slow and resource-intensive.
They had no way to deploy their Pulumi stacks in dependency order.
There was no simple way to get a project-level summary of proposed changes.

Happy to comply, we released a plugin for Pulumi that dramatically speeds up incremental deploys for large Pulumi projects by using Garden's graph engine and versioning system to deploy Pulumi stacks in dependency order.

Before we dive into our Pulumi plugin, though, let's take a closer look at Pulumi itself, the motivation behind its design, and how it fits into the broader landscape of Infrastructure-as-Code tooling.

For those of you who prefer video, I did a presentation on Garden's Pulumi plugin which you can find below.

Infrastructure as Code: Pulumi vs. Terraform

Under the hood, Infrastructure as Code (IaC) tools like Kubernetes, Terraform and Pulumi work by treating infrastructure as data.

All of them use the concept of resources (nested JSON-like data structures) to represent the state of the underlying system (e.g. how many copies of which containers to run, what the network config looks like, how databases and cloud services are configured), and use changes to those resources as the API for changing the state of the system.

For those of you with a background in frontend dev, this might sound familiar: It's very similar to the virtual DOM concept from frameworks like React. As a user, all you do is make changes to the virtual DOM. The underlying machinery then figures out the difference between the previous and the new virtual DOMs, and updates the actual browser DOM as needed.

Updating infra with an IaC tool means updating the data structures that represent the infra, and leaving the tool to actually execute the changes. The Code part of Infrastructure as Code is how we tell the IaC tool to update these data structures.

How we update these data structures is really a matter of taste, and is down to the design of the tools we use.

Config files: The good, the bad and the ugly

The most common approach these days is to use configuration files with template strings that get evaluated by the tool just before deployment.

When things aren't too dynamic (i.e. we don't need a lot of boolean expressions, mapping or filtering), this is a great approach. The configuration looks very close to the resource definitions we'll be sending over as the new desired state of the system:



# Taken from https://learn.hashicorp.com/tutorials/terraform/aws-build#write-configuration

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.16"
    }
  }

  required_version = ">= 1.2.0"
}

provider "aws" {
  region  = "us-west-2"
}

resource "aws_instance" "app_server" {
  ami           = "ami-830c94e3"
  instance_type = "t2.micro"

  tags = {
    Name = "ExampleAppServerInstance"
  }
}

On the flip side, when we do find ourselves needing conditionals, loops and data structure operations in our config, things quickly grow hairy and hard to read:

There's nothing wrong with the logic here per se—this is a public Helm chart for Postgres, and it's only natural to support a lot of config options to cater to the vast array of use-cases out in the wild.

The problem is that the authors of this Helm chart were forced to squeeze a lot logic into these inline template expressions. If this had been written in a full-powered programming language, it could have looked more like this (this is pseudocode and not intended to be an accurate translation, but bear with me):

This is a lot easier to read and reason about—there's less repetition, and much less logic in the data structure literal itself. You can skim the return value, and look at the definitions of any helper functions and local variables that are relevant to what you're thinking about when you come to the file.

The whole point of configuration files is to be simple to read and simple to reason about—that's no longer the case past a certain threshold of complexity in the template logic.

To be fair, Hashicorp's HCL language (which Terraform uses) is a lot nicer than Helm's templating system and ships with a bunch of useful template helper functions, but even there, there are no user-defined functions.

This puts a hard limit on abstraction and code reuse, and makes it hard to test the template logic (short of simply performing a dry-run deployment).

Enter Pulumi: Infrastructure as Code. Except really.

Pulumi has a fresh take on all this.

It embraces the core patterns of infrastructure as code—representing infra as data structures, diffing resources with their running versions—but uses full-powered programming languages (Typescript, Go etc.) instead of config files to render out the resource definitions.

Here's an example of Pulumi being used to perform some pretty neat deployment logic: Put a config file from the local machine into an S3 bucket, and then read the contents of that bucket into a Kubernetes Deployment resource just as it's being defined:

This is a great example of the flexibility and expressiveness that's unlocked by using a real programming language for deployment logic:

We can define and use helper functions and classes to encapsulate custom logic we want to reuse elsewhere (see the definition of s3Helpers here).
We can unit test any helpers we write!
We can use local variables to reduce repetition in the resource definitions and make them easier to read.
We can read files from the local filesystem and use them to create resources, and use any third-party libraries/tools we need (like curl above).

This begs the question: Why do people use config files at all?

The downside: With great power comes great responsibility

The main drawback of using a full-powered programming language for deployment logic is the flip side of that very same power: Without discipline, the logic can become tangled spaghetti code that's hard to maintain and reason about.

Calling curl in a deployment script (like we saw above) may seem cool at the time of writing, but if it's done in a helper class somewhere deep in a dusty corner of the codebase, the results can be surprising and hard to debug.

("It's doing WHAT?" a bewildered SRE shouts a couple of years later after a bizarre production incident on a Friday afternoon.)

That said—if you use this power responsibly and maintain good discipline and organization, it can yield great results!

Measure twice, cut once

In February 2022, Pulumi announced an experimental --plan option for the pulumi up command, which enables users to apply a pre-prepared plan (generated by calling pulumi preview --save-plan). As this blog post is written (September 2022), this option is still experimental (the latest stable Pulumi version is 3.40.1).

This is very similar to how Terraform's apply command can be passed a path to a pre-prepared plan.

For sensitive components where deployment mistakes can cause major disruptions, such as databases and network configuration, many companies require a code review process where reviewers check proposed deployment plans. They can then rest assured that only the changes in the plan they've approved get applied.

Improving the workflows around this three-stage "plan, review and deploy" process for Pulumi projects with lots of stacks was one of the main motivations for writing our Pulumi plugin.

Garden's Pulumi plugin: Why we wrote it—and how it works

When an enterprise customer of ours asked us to write a Pulumi plugin for Garden, we were happy to oblige.

This company uses Pulumi not only to deploy infrastructure, but also to deploy the Kubernetes resources for their services.

While this had generally been working great, they had a few issues, which we already mentioned above:

Their project consists of 30+ Pulumi stacks (each of which is deployed separately). Running pulumi preview for each of them to figure out if an update is needed meant repeated, parallel calls to the Pulumi CLI, which was slow and resource-intensive.
They had no way to deploy their Pulumi stacks in dependency order.
During code review, figuring out which stacks had no-op preview plans (i.e. where no changes were needed) and which had plans involving actual updates was frustrating and time-consuming. There was no simple way to get a project-level summary of the proposed changes.

Our Pulumi plugin is a relatively thin wrapper around the Pulumi CLI that makes use of Garden's built-in graph engine, templating system and versioning semantics.

Thanks to those foundations, it was relatively simple to write, and we were quickly able to solve all three of the above!

A Pulumi module is configured as follows:

Once you've added Garden configs for your project's Pulumi stacks, you can:

Run garden deploy to deploy your entire project, which now includes your Pulumi stacks: Any running services that haven't changed since the last deploy will be unchanged, and checking their status is lightning-fast—no expensive diffing required!
Run the garden plugins pulumi preview command to render a project-level summary of the changes made to any/all stacks when compared with the running stacks. Perfect for the three-phase "plan, review and deploy" workflow we described above.
Use one of the other plugin commands— garden plugins pulumi cancel | refresh | destroy | reimport—to run a Pulumi CLI command in dependency order for all (or a subset) of your Pulumi stacks, with full access to Garden's templating logic (our reimport command runs pulumi export | pulumi import). These let you operate on all or any of the Pulumi stacks in your project simultaneously, which simplifies a lot of common workflows for larger projects.

In a nutshell, our Pulumi plugin works as follows:

To deploy, we run pulumi up and create a stack tag containing the Garden service version.
To check the status of a service, we read the deployed Garden service version from the stack tag (thus leveraging the Pulumi backend as a shared datastore).
Because reading a stack tag only requires a fast, lightweight REST API call, we can quickly and cheaply determine whether a redeploy is needed.
This is made possible by Garden's versioning semantics, which factor in the source code of the Pulumi stack, plus any runtime configuration values, and the versions of all of the service's dependencies.
Garden environments map to stack names. We merge any config variables provided in the Pulumi service configuration with any stack config variables already in place, and write the merged config to a file, which we then pass to pulumi up during deploys.

The net result?

While we haven't received precise measurements yet, this customer reports that incremental deployment—that is, deploying to an existing environment—is several times faster than with vanilla Pulumi for this project.

And thanks to the project-level summary of proposed changes enabled by the garden plugins pulumi preview command, they've finally been able to implement a "plan, review and deploy" process around production deployments with Pulumi.

TL;DR: If you're using Pulumi and have several Pulumi stacks in your project, Garden is a great way to speed up incremental deployments and to implement a review process around Pulumi plans!

To learn more, check out the Pulumi guide in our docs (there's a very simple example project to go with it).

‍