One year of remote Kubernetes development: Lessons learned

Eyþór Magnússon

December 15, 2023

Imagine you're a developer working on a distributed system. It's early in the morning, you open your laptop and spin up your environment. And not just a part of it, literally the entire system with all its services, message queues and databases.

Your laptop hums quietly along — the whole thing is running in a remote Kubernetes (K8s) cluster.

You make a few changes to the service you’ve been working on and observe in real time how they affect the rest of your system with tools like Grafana and Jaeger, which are typically reserved for production.

Testing your changes is straightforward. Whether it’s end-to-end, integration, or load tests, you can run them from the convenience of your laptop as you code. Pairing and code review also become a breeze when anyone from your team has access to a fully running system with your changes.

Once you’re happy with your work, you can confidently push to CI, which is set up the exact same way. No surprises there. And you can remain confident when your changes land in production because production-like environments tend to work — well — like production.

Making the switch to remote development

At Garden, this is how our developers have been working for the past year. Making the move to using Kubernetes for development has come with its challenges — but it’s absolutely been worth it.

Full disclosure, at Garden we’re building a DevOps automation platform that enables these kinds of workflows. But I’m not here to talk about our platform, but rather our experience of using Kubernetes — with all its quirks and complexity — as a development environment. I’ll explain how we’ve set things up and how we’ve been able to shift left and empower developers without increasing cognitive load and context switching.

Why remote development environments?

If you’re reading this, you’re likely already using Kubernetes for production workloads. That means you probably have reproducible infrastructure, the required configuration (Dockerfiles, manifests, etc.), and the internal know-how to operate Kubernetes clusters.

The idea behind also using Kubernetes for development is to shift those resources left and empower developers with the same tooling that site reliability engineers (SREs) and operators have.

Removing the friction between the different stages of software delivery enables you to ship to production faster. That has certainly been the case for us. Beyond that:

Having production-like development environments reduces production bugs. Period.
Developers don’t need to wait for CI to run end-to-end and integration tests — and writing new tests is straightforward.
The same configuration and tooling can be used across all stages of delivery, reducing maintenance and drift.
Distributed systems are hard, and there are a lot of failure modes. Being able to inspect and interact with the entire system right from the beginning helps developers understand it, avoid pitfalls, and fix issues faster when things go wrong in production.
Development clusters have different characteristics from production clusters and can surface some interesting K8s gotchas. This can be a valuable learning experience and a good preparation for dealing with production issues.

Below we’ll share some best practices and ways to avoid common pitfalls.

Setup and best practices for remote kubernetes

Here is an opinionated guide on how to set things up, based on our own experience.

Use your existing tools and configuration: Both Helm and Kustomize are great for overriding production values in other environments. For example, you may want to set the replica count to 1 in development.
Make it a “one-step-deployment”: As things go wrong, developers may need to deploy their stack several times throughout the day. Having to run multiple steps to do that will cause a lot of friction.
Isolate environments with Kubernetes namespaces: You can template a developer's shell username into the namespace name to give each developer a unique namespace for their stack. As a simple example, your deployment script might run kubectl apply --namespace my-project-$USER <rest-args>. A lot of tools can automate this for you (see below).
Ensure unique hostnames via ingresses: Imagine Erica and Enes are both developing an app called Furby. If you add a DNS wildcard entry and Transport Layer Security (TLS) certificate for *.furby.dev.company.com, Erica and Enes can now both deploy their version of the app with an Ingress pointing to either erica.furby.dev.company.com or enes.furby.dev.company.com. And again, templating can be done using Helm or Kustomize.
Ensure the team has support: Things will go wrong; when they do, developers are blocked. Ideally someone on the team or close to it can immediately help.
Prioritize developer experience: Getting these things “just right” is an ongoing project, so make sure to account for DevEx in your sprint planning.

Pitfalls for remote Kubernetes development

Here are some things to avoid — and that we’ve had to learn the hard way.

Avoid attaching workloads in ephemeral namespaces to persistent volumes. Even if it’s meant to be a long-running development namespace, you may need to delete it at some point — and then things tend to go south. Ephemeral and stateful don’t go well together. Consider using Kubernetes VolumeSnapshots to quickly restore development databases from a shared volume.
Resource requests and limits for CPU and memory will be different from production, and getting them right is always a nag. Goldilocks is an open source tool that helps you automatically detect the right settings.
If all your developers spin up their environments at the same time, say in the morning, you may run into scheduling issues. Autoscaling helps, but there’s always a bit of lag.
Even if you use a managed cloud solution (such as Amazon Elastic Kubernetes Service, Google Kubernetes Engine, or Azure Kubernetes Service), there will be maintenance work, like around version updates. On the flipside, this is a great way to catch issues before they happen in production.
Sometimes things just don’t work. We’ve seen issues with I/O saturation on Azure or the Container Network Interface (CNI) plugin failing to assign IP addresses on AWS. Sometimes it’s K8s; sometimes it’s you. But it does highlight the importance of having someone available to help and a simple and idempotent redeployment process.

Conclusion: Remote development environments for all

At Garden, we’ve been developing against fully remote environments for over a year. It has come with its share of headaches. But has it been worth it? Absolutely!

Every bump on the road has taught us something and made us even more prepared to operate our system in production.

If you’re thinking about developing in a remote Kubernetes cluster, I encourage you to check out Garden. It allows you to automate away a lot of the pain of K8s with one config to deploy your whole stack in dev, QA and production, eliminating configuration drift and enabling your devs to spin up production-like environments for development and CI.

I’d also like to give a big shoutout to the SRE team at Garden for their continuous work on improving the experience and to the developers for their patience and good humor when our clusters are having a bad day.