Purple white and red flowers.
Our garden is growing. We've raised a Series A funding round.
Read more here

Performance testing on a microservice architecture

Albert Kostusev
Albert Kostusev
March 22, 2023

Bottlenecks and performance issues usually show face once peak production loads hit. If you want to prevent such issues and be aware of your potential maximum capacity or just minimize the cost of running your microservices, performance testing is what you need.

This article will go over benefits, general solutions, and designs for such systems but won’t be a guide for building them.

We’ll cover the what, why, and when, but not how.

Note: This is a subjective guide mostly based on my personal experience!

What is performance testing?

Performance testing is simply measuring how efficient something is. For software, that means resources (CPU, GPU, MEM, network, etc.) used per unit of work done.

Let’s imagine you run a resource-intensive IT service, for example, an AI image enhancement platform. The performance would be determined by the number of images processed / infra costs.

Performance testing usually means running an overwhelming amount of load on a system or its parts to find bottlenecks, instability, or wasted resources. It can be performed in two ways, depending on how resources are allocated.

  • resources are statically allocated - test at a set resource level
  • resources are dynamically allocated - test the absolute maximum load the system can handle

In practical terms, performance testing of this image enhancement service of ours would mean making it process a large amount of images. It’s similar to using production data to measure resource usage and monitor system health.

When to start thinking about performance testing

For most software projects, performance testing makes little sense. Resource bills usually make up too small a portion of overall costs to be worth optimizing. Instability is usually caused by bugs and misconfiguration rather than scaling issues.

That said, if your project does require a lot of computing power and a major chunk of income goes to your cloud provider, or you often experience instability due to scaling issues, performance testing might aid you in your troubles.

Where to get the data from?

You probably realize you need some kind of input for the system to crunch on. The closer the input is to real-life production data, the better the results.

Theoretically, an easy way out would be to feed real production data into the testing system. But this might prove difficult due to privacy and compliance concerns. Here are a few tips for working around privacy problems when you use production data for performance testing:

  • Present a compelling business case for management. Using production data will help to cut expenses and getting optimization-loving management on your side will help in your fight against the security and privacy people.
  • Don't make the data available to everyone. Most people will be interested in resource numbers and processing metrics and don't need to know what exactly is being processed. Limiting access is an easy way to limit privacy liabilities.
  • If you have a large enough flow of prod data, just mirror some of it for testing purposes and make sure to never store it long-term.
  • Allow users to offer production data voluntarily to improve the service.

Using real production data has its downsides. You’ll have to work around security and privacy concerns and might drown in the IAM soup. In some cases, it might be better to find another source of data for testing.

Here are a few alternatives to using production data:

  • submit your own legitimate data
  • generate the data
  • scrape the data
  • buy the data from users or from a firm specializing in creating it

Once you have goods to feed into the machine, we can proceed with the testing process. First, we’ll take a look at classic stress testing. Then, we’ll further explore the benefits of testing individual microservices.

Stress testing the entire stack

Most of the benefits of performance testing can also be achieved by load or stress testing your entire stack (or the part responsible for the heavy lifting).

Usually, you can conduct such tests by creating an environment as close to production as possible and firing a massive load. If the service is doing compute-intensive operations, some kind of queue system is often used to avoid overloading the system in case it can't quite handle the incoming load.

Static environment testing

A Static environment with a limited amount of resources will allow you to fine-tune replica numbers until you’ve reached a good equilibrium. Additionally, it will reveal bottlenecks and hopefully give data on the resource usage of individual services. A well-executed static environment stress test will also give a metric of resource efficiency. Historic data on such metrics can be extremely useful for management and developers alike.

Dynamic environment testing

A test on a dynamic environment (one that scales with load increase) will provide data regarding your maximum possible load and the problems that occur as loads increase. This test should be conducted before working with a customer who you expect to double your daily load.

Issues that come up with such tests are often infra-related. In my personal experience, I’ve seen things like AWS running out of GPU nodes in a region or having a maxed-out number of IPs in a Kubernetes cluster. Often, artificial limits for APIs and infrastructure come up. Such issues can be extremely tedious and hard to fix, especially if they come up in production all at once.

If you don’t have dynamic scaling, you can imitate it by scaling your infra manually.

Load testing individual microservices

Put simply, testing an individual service is carried out in exactly the same way as testing the entire stack. Feeding input and measuring how fast output is served while collecting various metrics. In practice, such systems are a bit tricky to build and can be especially difficult to automate.

Nevertheless, it’s an extremely useful exercise. Even if it’s executed partially or imperfectly, it’s recommendable for the most resource-intensive or problematic services.

Message queue and Kubernetes-based example

  1. Run X amount of load through your testing environment, copying all messages to duplicate queues without consumers
  2. Scale down all consumers (deployments)
  3. Reset caches and databases
  4. Either reconfigure services to use the new queues or shovel the messages back into the real queues
  5. Scale services one by one, letting them consume and process their queues
  6. Measure resource usage percentiles, network calls, and time to process the input
  7. Calculate the capacity or resourcefulness of each individual service

Such tests could be set to run weekly. Or, if you save the results from steps 1-4, you can reuse the data and incorporate tests into the CI pipeline of each service. This will help to catch performance-related issues very early in the development cycle.


In some cases, it makes sense to do performance testing. Usually, this implies running a large load through the system to gather data and find issues.

Performance testing with Garden

Garden is an open-source tool that aids with development and testing on Kubernetes. If you are struggling with running E2E on your complex Kubernetes stack or want to give performance testing a try but don’t have the automation to set up the necessary production-like environments, give us a try!

previous arrow
newt arrow