Purple white and red flowers.
Our garden is growing. We've raised a Series A funding round.
Read more here

We reported a security issue in AWS CDK's eks.Cluster component

Anna Mager
Anna Mager
Steffen Neubauer
Steffen Neubauer
June 6, 2023

We recently discovered a security issue in AWS CDK, the open-source cloud development kit that can be used to deploy and manage infrastructure as code using different programming languages.

After reporting the issue to the AWS Security Team, the AWS CDK team swiftly started working on a solution and published fixed versions of AWS CDK, v2.80.0 and v1.202.0, on May 20th. You are affected if you are using older versions of the AWS CDK to create EKS (Kubernetes) clusters.

AWS CDK brings a lot to the table as it allows engineers to write infrastructure as code in any of 5 popular programming languages, using a vast library of abstractions called “Constructs”.

Constructs provide functionality on different levels of abstraction, some only represent mere AWS resources like S3 buckets or IAM roles while others can be entire solutions with well-architected defaults.

The output of a CDK program is a CloudFormation template, which in turn can be used to actually create or update the AWS resources on your AWS accounts.

The security issue we discovered was present in the <span class="p-color-bg">eks.Cluster</span> component since 2020 – around three years ago at the time of writing. If you want to jump to the code straight away, this line of code is the culprit:

// the role used to create the cluster. this becomes the administrator role
// of the cluster.
const creationRole = new iam.Role(this, 'CreationRole', {
  assumedBy: new iam.AccountRootPrincipal(),
});

If this doesn’t already give away the entire story for you, in this blog post, we will explain the vulnerability, how and why we found it, why it matters to you and what you need to do to resolve the issue in case you are affected.

We will also share our experience working with the AWS Security team to report and resolve the issue.

What’s going on here 🔎

Let’s zoom out a little bit and first look at the functionality of the <span class="p-color-bg">eks.Cluster</span> component.

The <span class="p-color-bg">eks.Cluster</span> component performs significant heavy lifting for the user: Not only can it help to create and manage an EKS cluster, which is an AWS-managed Kubernetes cluster, it can also be used to install custom Kubernetes manifests on the EKS cluster. This functionality is implemented under the hood using CloudFormation Custom Resources.

Custom resources are Lambda functions that receive events like “Create”, “Update” or “Delete” from the CloudFormation service, and in turn execute appropriate infrastructure actions.

For a Lambda function to be able to install a Kubernetes Manifest in an EKS cluster, it will need permission to assume a role that grants it permission to access the Kubernetes API.

To facilitate creating the EKS cluster and then deploying resources on top of it, the Cluster component will create an IAM role called <span class="p-color-bg">creationRole</span> which has full access to the created EKS cluster and the permission to update or delete it.

That’s pretty awesome actually, because it makes it really easy to set up an EKS cluster and run complex applications on top of it, all in a few simple lines of Javascript or Python that are easy to read and easy to understand.

const cluster = new eks.Cluster(this, 'my-cluster', {
    clusterName: `my-cluster`,
    version: eks.KubernetesVersion.V1_23,
    defaultCapacity: 2
});

cluster.addHelmChart('NginxIngress', {
  chart: 'nginx-ingress',
  repository: 'https://helm.nginx.com/stable',
  namespace: 'kube-system',
});

The trust policy of this <span class="p-color-bg">creationRole</span> was set to allow the AWS account root principal to assume it.

This is the entire problem, as it means that every identity with <span class="p-color-bg">sts:assumeRole</span> permission in the AWS account is allowed assume this role, and not just the Lambda functions part of the CloudFormation template that actually need the permission.

Even worse, AWS CDK for EKS offers means to manage access to the created EKS cluster, but the undocumented <span class="p-color-bg">creationRole</span> undermined all the access management facilities altogether as it allows any identity with the <span class="p-color-bg">sts:assumeRole</span> permission in the account to assume it.

Why we dug so deep ⛏️

At Garden we are committed to transforming your developer experience by making it feel as if your entire stack of services (we call it the Stack Graph!) runs on your local laptop, while actually running on a production-like remote Kubernetes cluster.

As Site Reliability Engineers here at Garden, we were working on an AWS quickstart solution to simplify the process of setting up the remote development cluster. The goal is to boil down the entire setup process to one click.

Our development cluster solution aims to enable you to use all major Garden features right away:

  • Autoscaling to support any Engineering team size and workload requirements
  • Fast builds and quiet fans using Garden’s in-cluster building and AWS ECR repositories
  • The ability to deploy a separate instance of your app for every developer, branch, CI run, staging environment (whatever you need!) using a wildcard Route53 record, all with SSL using AWS Certificate Manager, a Load Balancer, as well as an Ingress Controller.

If we made you curious about our solution, or you think this might be exactly what you were looking for, feel free to give it a spin at https://github.com/garden-io/garden-aws-quickstart.

Because many of our users also use AWS, and it’s possible to deploy CloudFormation templates using one click, we decided to use CloudFormation as a means to distribute this solution.

Due to the complexity of AWS EKS, writing CloudFormation code manually to do all this is not easily achievable. That’s why we decided to use the AWS CDK and the AWS EKS blueprint as a foundation for our custom solution.

We wanted to make sure we created a well architected and secure solution to you, so we took extra care when it came to access management when we added the ability to specify IAM users and roles to the CloudFormation stack. So we double checked all the IAM roles and their policies.

The discovery 💎

We realized that our CloudFormation stack was not only creating the roles we specified, but also additional ones that are required for its internal functioning.

This in and of itself is completely fine, but while we were optimizing the privileges, we stumbled upon the so called <span class="p-color-bg">creationRole</span>.

The <span class="p-color-bg">creationRole</span> has the privileges to edit or delete the EKS cluster, to create service linked roles, describe several properties of EC2 instances and also to perform encrypt and decrypt operations on a KMS key. The Kubernetes <span class="p-color-bg">aws-auth</span> ConfigMap , which is used to map IAM identities to Kubernetes RBAC roles, grants it the <span class="p-color-bg">system:masters</span> role.

The <span class="p-color-bg">system:masters</span> role in Kubernetes can access all namespaces and resources in the cluster. It basically grants super-admin privileges to the created EKS cluster.

All of this makes sense, because CloudFormation will need to do all of these things including managing the <span class="p-color-bg">aws-auth</span> ConfigMap and deploying services to the cluster.

However, what struck us as odd was the trust policy of the <span class="p-color-bg">creationRole</span>. A trust-policy determines who can assume a given role and in this case it was set to the account root principal:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:root"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

This has the effect that every identity in the account with <span class="p-color-bg">sts:assumeRole</span> permission can assume this role.

At first we assumed this issue was part of the EKS blueprints pattern, but upon digging a bit deeper we realized that it can be traced back all the way to the AWS CDK code.

And here, again, the culprit of the issue, a single line of code, hopefully now making a little bit more sense to you:

// the role used to create the cluster. this becomes the administrator role
// of the cluster.
const creationRole = new iam.Role(this, 'CreationRole', {
  assumedBy: new iam.AccountRootPrincipal(), // <= this is the culprit
});

Why this matters

The existence of the <span class="p-color-bg">creationRole</span> together with this overly permissive trust policy undermines all the access controls put in place when carefully managing the <span class="p-color-bg">aws-auth</span> ConfigMap, as any identity (e.g user or another role) with the <span class="p-color-bg">sts:assumeRole</span> permission in the AWS account will be able to access that cluster with <span class="p-color-bg">system:master</span> permissions.

This has far-reaching implications. Any identity in the account would be able to add more users or roles to the cluster, delete users or roles’ access to the cluster, deploy any kind of application and decrypt secrets in the cluster. Decrypting secrets in the EKS cluster could potentially broaden the blast radius of this issue to other services in or outside AWS.

Since this is part of AWS CDKs EKS construct implementation this affects every EKS cluster created using the AWS CDK.

We believe that the fact that it is completely hidden to the user increases the severity. The permissions and security guide on the CDK EKS documentation outlines several mechanisms to securing the cluster and granting permissions to IAM users and roles. The <span class="p-color-bg">creationRole</span> and its capabilities are not mentioned. If you are using the CDK to create EKS clusters you are probably not aware that this role exists.

Once you have become aware of its existence, there is no way to avoid its creation since it is deep in the machinery of the EKS construct and not configurable. The role exists for the whole lifecycle of your CDK stack.

What happened next

After going back and forth trying to secure the trust policy of the <span class="p-color-bg">creationRole</span> from our code, we came to the conclusion that this is not an issue with our code, but with the CDK itself.

So we wrote an e-mail to aws-security@amazon.com. Amazon is taking security seriously and offers means for security researchers to report vulnerabilities.

The SLA from Amazon’s side were well-kept and we received an answer within 24 hours. They soon confirmed that this was indeed treated as a valid security issue by AWS and that someone would be working on a fix. They were transparent, immediately worked on a fix and kept us posted.

We stayed in close contact with AWS security to make sure that this blog post would not be released before the fix has been released.

Overall, we think it was a pleasure to work with the AWS Security team to solve this issue and collaborate on this blog post.

Full timeline

Date & summary

Apr 18, 2023, 7:38 PM: Initial Report sent to AWS Security [1]

Apr 19, 2023, 12:44 PM: Confirmation received from AWS Security: Will investigate

Apr 21, 2023, 4:18 PM: Update received from AWS Security: Still investigating

Apr 28, 2023, 4:15 PM: Update received from AWS Security: Working on a fix

May 11, 2023, 7:04 PM: Update received from AWS Security: Working on a fix

May 17, 2023, 5:48 PM: Update received from AWS Security: Planning to notify affected customers

May 24, 2023, 7:06 PM: Report sent to AWS Security that the issue has not yet been resolved in EKS Blueprints [2]

May 24, 2023, 7:49 PM: Confirmation received from AWS Security: Service Team has been engaged on the issue

May 26, 2023, 7:26 PM: Update received from AWS Security: Working on updating AWS CDK library in EKS Blueprints

May 30, 2023, 10:24 PM: Update received from AWS Security: Issues have been resolved in latest versions of both aws-cdk and cdk-eks-blueprints as of 2023-05-26

The solution

While the essence of the problem is one line of code, solving it in such a foundational component was much more complicated and required serious engineering efforts on AWS’s side.

On first glance, the solution looks quite straightforward by changing the trust policy to only allow the <span class="p-color-bg">creationRole</span> to be assumed by the specific roles that need to:

// the role used to create the cluster. this becomes the administrator role
// of the cluster.
const creationRole = new iam.Role(this, 'CreationRole', {
  // the role would be assumed by the provider handlers, as they are the ones making
  // the requests.
  assumedBy: new iam.CompositePrincipal(
		provider.provider.onEventHandler.role!,
		provider.provider.isCompleteHandler!.role!
	),
});

What apparently complicated the implementation is the fact that the <span class="p-color-bg">kubectl</span> Lambda function is implemented within a <span class="p-color-bg">cdk.NestedStack</span>:

// note that 'scope' is used here intentionally because we have to create the role
// outside of the nested stack, so that it can be used in the trust policy of the
// creation role.
const handlerRole = cluster.kubectlLambdaRole ?? new iam.Role(scope, 'KubectlHandlerRole', {
  assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'),
  managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSLambdaBasicExecutionRole')],
});

And then, integration testing required importing thousands of lines of updated code from <span class="p-color-bg">@aws-cdk-testing/framework-integ/test</span>.

The fix has been released with the AWS CDK versions v2.80.0 and v1.202.0 on May 20, 2023.

Amazon not only improved the <span class="p-color-bg">creationRole</span>, but also the <span class="p-color-bg">mastersRole</span>: If not specified by the user, the <span class="p-color-bg">mastersRole</span> also used a too permissive trust policy, with the difference that it was documented and the user could override the behaviour.

With these versions an overly permissive trust policy is not the default anymore, and we are really happy that this makes <span class="p-color-bg">eks.Cluster</span> now safe by default.

If you are using AWS CDK versions prior to v2.80.0 or v1.202.0 and the <span class="p-color-bg">eks.Cluster</span> component (directly or indirectly), we strongly recommend updating your CDK version and deploying the new version of your stack with the updated component.

Conclusion

Overall this was a really interesting and exciting process for us and we were very happy with the communication with AWS about this issue, and we feel that AWS resolved the issue real quickly.

Just in case you’ve gotten curious about the Garden AWS Quickstart solution, feel free to give it a spin on GitHub at https://github.com/garden-io/garden-aws-quickstart – and also stay tuned as we will soon accompany it with another blog post.

If you have any feedback regarding this post or would like to join the conversation in our community, you are very welcome to join us on discord at https://go.garden.io/discord.

Footnotes

[1] Initial report sent to AWS

From: Steffen Neubauer *@garden.io

To: AWS Security <aws-security@amazon.com>

CC: Anna Mager *@garden.io

Dear AWS Security Team,

My colleague Anna Mager and I discovered the following security vulnerability. We kindly bring this to your attention in this Email.

Quick summary

CDK: EKS cluster adminRole grants excessive permissions to any principal in the account

Affected Open Source projects

Steps to reproduce the issue

When creating an EKS cluster using the AWS CDK libraries, as following:

const clusterAdmin=iam.Role.fromRoleArn(this,'arn:aws:iam::xxx:role/aws-reserved/sso.amazonaws.com/eu-central-1/AWSReservedSSO_AdministratorAccess_xxx');const cluster=neweks.Cluster(this,'demogo-cluster',{
       clusterName:`demogo`,
       mastersRole: clusterAdmin,
       version: eks.KubernetesVersion.V1_18,
       defaultCapacity:2});

   cluster.addAutoScalingGroupCapacity('spot-group',{
     instanceType:newec2.InstanceType('m5.xlarge'),
     spotPrice: cdk.Stack.of(this).region==primaryRegion?'0.248':'0.192'});

For an exhaustive step-by-step guide, please refer to the official EKS CDK workshop at https://catalog.us-east-1.prod.workshops.aws/workshops/c15012ac-d05d-46b1-8a4a-205e7c9d93c9/en-US/40-deploy-clusters/200-cluster/210-cluster

Observed behaviour

  • The mastersRole will be added to the EKS cluster's auth ConfigMap, as expected
  • Roles associated with Lambda functions that are providers for EKS CustomResources have access to the AWS resources related to the EKS cluster, as expected
  • Any principal in the same account with the sts.AssumeRole has permission to assume the "CreationRole" – see also cluster.adminRole – This is unexpected! This indirectly grants excessive permissions to all principals with sts.AssumeRole – for example they all can delete the EKS cluster, and use the KMS key to decrypt or encrypt.

See also the screenshots attached to this Email[2].

See also the culprit lines of code on GitHub:- https://github.com/aws/aws-cdk/blob/0d156a810a7a049e03f2d84582f12b7a231dea2e/packages/aws-cdk-lib/aws-eks/lib/cluster-resource.ts#L122- https://github.com/aws/aws-cdk/blob/0d156a810a7a049e03f2d84582f12b7a231dea2e/packages/aws-cdk-lib/aws-eks/lib/cluster-resource.ts#L60

Expected behaviour

  • The mastersRole will be added to the EKS cluster's auth ConfigMap
  • Roles associated with Lambda functions that are providers for EKS CustomResources have access to the resources related to the EKS cluster
  • Other untrusted principals in the account cannot assume a role created by the CDK code above do not have excessive permissions (e.g. modify or delete) to the AWS resources related to the EKS cluster

Steps to remediate the issue

The trust relationship of the "CreationRole" should only grant permission to assume it to the role associated with the Lambda function that provides the custom resources that need access to the EKS cluster, e.g. to perform kubectl operations.

Workarounds

We did not find a working workaround yet. We also did not find documentation that makes users aware of this behaviour. If you are aware of a workaround other than not using the CDK eks.Cluster library, please let us know.

Severity

The cdk eks blueprint library (affected by this issue as well) advertises that the project can be used to deploy well-architected EKS clusters and manage access permissions to the clusters[1].

Because

  • Any principals in the account have access to critical EKS resources like the KMS key and API endpoints like update EKS cluster, delete EKS cluster etc.
  • This contradicts with the advertised features

I personally feel this is a high severity issue, and it should be resolved within a reasonable time frame of 90 days.

Responsible disclosure

We reserve the right to disclose this issue at some point in the future (responsible disclosure). We will attempt to coordinate the date with you. We hope that this issue will be resolved until then.

Please let us know if there was any missing information, or if there were any errors in this report. We would like to stay in touch about this issue.

Thank you for the great work of keeping AWS secure in advance!

References

[1] According to https://github.com/aws-quickstart/cdk-eks-blueprints

[...]

Customers can use this QuickStart to easily architect and deploy a multi-team Blueprints built on EKS. Specifically, customers can leverage the eks-blueprints module to:

  • Deploy Well-Architected EKS clusters across any number of accounts and regions.

[...]

  • Define teams, namespaces, and their associated access permissions for your clusters.

[...]

[2] Screenshots of the "CreationRole":

<Screenshot>

[2] Report sent to AWS Security that the issue has not yet been resolved in EKS Blueprints

From: Steffen Neubauer *@garden.io

To: AWS Security <aws-security@amazon.com>

CC: Anna Mager *@garden.io

Hi R.,

We would like to inform you that the original issue reported in the email on April 18th has not been fully resolved.

Issue description

it has been resolved in the aws-cdk (

https://github.com/aws/aws-cdk

)

– But not downstream in cdk-eks-blueprints (

https://github.com/aws-quickstart/cdk-eks-blueprints

)

The blueprints depend on "aws-cdk": "^2.78.0", (See package json file at

https://github.com/aws-quickstart/cdk-eks-blueprints/blob/main/package.json#LL24C28-L24C28)mm)

– but the version fixed is 2.80.0 or 1.202.0

Workarounds

The consumers of the aws-quickstart cdk-eks-blueprint cannot update the cdk themselves, as it would result in type errors down the line (At least that is what we found so far in our experiments).

Steps to remediate the issue

It's important to update cdk-eks-blueprints to a fixed version and inform users of the blueprints of this issue as well.

There might be more downstream users of the CDK eks.Cluster component, but the cdk-eks-blueprints seem to be an open-source repository that is being maintained by AWS so it should be in scope for reports via aws-security@amazon.com as well.

Responsible disclosure

Because this information was already present in the original report from April 18th we expect it to be fixed in the same 90 day window (until 18th of June).

Thank you for keeping AWS and your customers secure!

Best regards,

Steffen

previous arrow
Previous
Next
newt arrow