Kubernetes deployments are complicated, with containers using a suite of interacting technologies. These technologies can have different versions and different configurations, as do their dependencies. They're often deployed in multiple types of environments, such as production or testing, which each have their own requirements.
When deploying software in different environments, changes are made according to each individual use case. This can lead to your deployments drifting away from your original, baseline configuration. As these changes add up, your systems will eventually start behaving inconsistently across environments. Unfortunately, these issues can be difficult to diagnose and fix, especially since the changes are often undocumented.
This process is commonly known as configuration drift. With time, your deployments get more and more divergent from the original configuration, and realigning them gets harder.
In this article, you'll learn how to prevent configuration drift, how to fix it, and how to detect it when it occurs.
Why you need multiple Kubernetes environments
There are several scenarios where having multiple Kubernetes environments is beneficial. Here are some examples:
Separation of production, development, and testing
It's common to have separate environments for code at different stages of the product lifecycle, such as production, development, and testing. If you want to test a new Kubernetes version, a service mesh, or some other piece of network software, then it makes sense to do that in a different environment.
These different environments will naturally diverge from one another as a consequence of their different roles. Development and test environments may not use real data. Live environments might not include testing features. Memory and CPU allocations can change per environment, too.
Compliance can often mean you have to run your environments in separate clusters or virtual private networks (VPNs). Regulations, often written with other points in this list in mind, are there to make sure sensitive data is safe and secure, and that can include mandating specific locations for certain kinds of data.
For example, the General Data Protection Regulation (GDPR) requires data on EU citizens to be stored within the EU. You have to factor that into your architecture if you're offering services globally.
For security, it's always better to have your production environment separated from development and testing. Changes can result in new vulnerabilities, and your customer-facing services can be more exposed to attack.
Having multiple environments means that if a malicious actor gains access to one environment, then that access is limited. With a single environment, a vulnerability in one place can lead to attackers gaining access to your entire system.
It's common to provision servers for different roles and for load distribution. This means that you can tune each server according to the role it plays.
For instance, a database server might need more memory. A public-facing server might need stricter timeouts to prevent distributed denial-of-service (DDOS) attacks. An AI service can benefit from more CPU time. These things can be implemented via configuration changes, leading to drift.
How to identify configuration drift
Knowing what configuration drift isn't the same as being able to spot it. You need to actively look for it, and you need to know what to look for.
Keeping track of your configuration files is essential. You need a log of changes. GitOps achieves this, as does using software designed to look for drift issues. Tools like wksctl can also help you manage your clusters and configurations using Git.
You also need to list the software and files that need to be monitored. Every time someone updates or changes your toolset, any affected software and its dependencies need to be included.
Even with these processes, it's still possible to overlook issues. Having a procedure in place makes it less likely, and you have something for others to refer to when trying to identify problems.
With a versioning system, you can designate a configuration as your baseline. It's then up to anyone that makes changes to their version to document any deviation from that.
Once your system is in place, if you run into a config issue you can't eliminate, you can roll back to your base config. Don't ignore drift-related problems. If one of your systems is behaving erratically for reasons you can't identify, you may be better off rolling your configuration back to an earlier version, even if that means losing any updates made in the modified configuration.
How to battle configuration drift
To fight configuration drift, you need a plan. That means thinking about what tools you use and, particularly, how you deploy to different environments.
Following are some tips and best practices for battling configuration drift:
Strategies for managing multiple Kubernetes environments
One of the main causes of configuration drift is poor management of multiple environments. Without a clear strategy, it's easy to let small changes spiral out of control. Here are some common strategies for managing different Kubernetes environments more effectively:
You can simply have multiple clusters and handle them separately in the same way you would handle an individual cluster. This is basic and straightforward, but you miss out on the more advanced features that a dedicated solution might provide, as detailed in the next sections.
However, beware: Helm allows users to easily override the provided values with their own settings. That means potential drift. If using Helm, it's important to be aware of this possibility, perhaps making it clear to devs that they shouldn't override values without documenting it.
Kustomize Manifests per Environment
Kustomize lets you take a similar approach to Helm in having a central configuration that can then be overridden for each specific environment.
It uses YAML to describe your configuration and is included in kubectl, the Kubernetes command line tool.
One powerful feature of Kustomize is composition, enabling you to build config files from others. That way, you can have different versions of configuration files that you can mix and match according to the use case. For example, if you have three different service configurations and two different deployment configurations, you could store them all in separate files and then quickly combine them in whatever combination you need.
Best practices for battling configuration drift with multiple environments
There are several things you need to think about to get the most out of your different environments. The following best practices help prevent drift and keep things from getting out of hand.
Keep your environments as similar as possible
In order to help battle configuration drift, you need to keep configurations as similar as you can. With dev, production, and test environments, it often makes sense to make changes, but you can go too far. While it's extraordinarily useful to cater to different use cases, you don't want things to become completely differentiated.
Config changes are useful, but make them sparingly, unless there's a real benefit. Testing, in particular, is more useful if it resembles your other environments.
Use GitOps to simplify your workflows
GitOps, if done correctly, can make your workflows simpler. By using Git as a single source of truth for your configuration and miscellaneous files, you have a record of what has changed. Even in cases where people make undocumented changes, you can compare them relatively easily to your documented baseline.
Moreover, with GitOps, you need an operator to look for changes and notify you when they happen. Some tools, such as Helm, have their own operators. Other alternative operators include Flux and Argo CD.
Use ephemeral environments to save on costs and minimize drift
Ephemeral resources make deployments much cheaper and give you the agility to make changes easily. Moreover, they can help you minimize drift. Using ephemeral resources means developers don't need to do repeated builds of the same deployment at different stages. Repeating builds, with different people changing code at different times, can easily lead to drift.
Ephemeral resources also let different team members create builds in parallel, which is impossible in a single, long-running test environment.
Garden, a DevOps automation tool for Kubernetes, allows you to create ephemeral environments with a single click. The process is automated, which means configuration drift is much less likely to occur. It also helps speed up developer onboardings.
How Garden eliminates configuration drift
Garden was designed to solve the problems that usually accompany complicated deployments. It provides disposable development environments that you can manage using simple YAML descriptions. Under the hood, Garden collates all these descriptions into its Stack Graph, which you can then execute in any environment.
It gives you an extra layer of abstraction, from which you can manage your environments and their configurations. With that level of control and observability, you can reduce configuration drift and ensure changes are tracked and managed.
You can set deployment targets on a per-environment basis, letting you make the changes you need in a structured way. This means you get the power to specify exactly what you need without the chaos that drift can bring.
Using Garden can improve performance and make your builds more consistent.
Configuration drift can leave your developers in limbo, with time wasted on changes that are out of sync with the rest of the team. Configuration drift can creep up on you, and if you don't look out for it, you'll only detect it when the problems become overwhelming.
However, with the right strategy, you can keep things under control. Half of the battle is being aware of the problem. Putting a plan in place is a great next step, and that can be complemented by the right tools.
If you want to battle configuration drift, Garden is a great place to start. It combines rapid development, testing, and DevOps automation, empowering your developers to monitor every part of your deployment. With it, you can work faster and identify areas for improvement, ensuring all your team's work is contributing to the whole.
Written by James Konik
Uncertain if he's a coder who writes or a writer who codes, James tries to funnel as much of this existential tension as possible into both of his passions but finds it of more benefit to his writing than his software. When occasionally hopping out from behind his keyboard, he can be found jogging and cycling around suburban Japan.