An introduction and setting up kubernetes cluster on AWS using KOPS


In this post I’m going to introduce the KOPs – Kubernetes Operations – concepts, why shoud use it, their advantages and overview. Then, I will provision some kind of AWS infrastrucure using KOPs, scale up and down our infrastructure and as a result I will deploy a monitoring solution to test our infrastrucute using ingress.

What’s kOps ?

Managing kubernetes cluster across multiple regions and clouds presents a handful of incredibly complex chanllenges and kOps is the easiest way to get a production grade Kubernetes cluster up and running. Kubernetes Operations helps us to create, destroy, upgrade and maintain production, highly available, Kubernetes cluster in cloud infrastructure. As kubernetes administrators, we know the importance to ensure that our Kubernetes clusters are upgraded to a version that is patched for the vulnerability and kops helps us to accomplish it. Another challenger is to provision a kubernetes cluster in cloud infrastructure, because we’ve deal with instances groups, route53, autoscaling groups, ELBs ( for the api server) , security groups, master bootstrapping, node bootstrapping, and rolling updates to your cluster and kops makes this work easier. As a result, managing a Kubernetes cluster on AWS without any tooling is a complicated process and I do not recommend it.

Kops is an open source tool and it is completely free to use, but you are responsible for paying for and maintaining the underlying infrastructure created by kops to manage your Kubernetes cluster. According offical site, now AWS (Amazon Web Services) is currently officially supported, with DigitalOcean, GCE, and OpenStack in beta support, and Azure and AliCloud in alpha.

kOps Advantages

Specifically in AWS, Why use kops instead of other provider cloud solutions like EKS?

One of the major advantages is that kops will create our cluster as EC2 instances and you are able to access the nodes directly and make some customs modifications, as result you can choose which networking layer to use, choose the size of master instances, and directly monitor the master and work nodes as well scale up and down your infrastructure only edit a file. You also have an option of setting up a cluster provisioning of Highly Available Kubernetes clusters or only a single master, which might be desirable for dev and test environments where high availability is not a requirement.

KOps also supports built on a state-sync model for dry-runs and automatic idempotency brings a powerfull model to version control your cluster setup and gives possibilities to use GitOps as pull model instead of push model using the best practices. If you would like, Kops also supports generating terraform config for your resources instead of directly creating them, which is a nice feature if you use terraform.

According official site, kOps has the following features:

  • Automates the provisioning of Highly Available Kubernetes clusters

  • Built on a state-sync model for dry-runs and automatic idempotency

  • Ability to generate Terraform

  • Supports zero-config managed kubernetes add-ons

  • Command line autocompletion

  • YAML Manifest Based API Configuration

  • Templating and dry-run modes for creating Manifests

Choose from most popular CNI Networking providers out-of-the-box

Multi-architecture ready with ARM64 support

Capability to add containers, as hooks, and files to nodes via a cluster manifest

kOps Overview

In this section, I will provide you only the main commands that I consider important to provision and maintain a kubernetes cluster with kOps. If you want to go beyond, you can look it up on the official website.

kops create
kops create creates a resource like a cluster, instancegroup or a secret using command line parameters, YAML configuration specification files, or stdin.

For example, there are two ways of registering a cluster: using a cluster spec file or using cli arguments.

If you would like to create a cluster in AWS with High Availability masters you can use these parameters:

Or you can save your configuration in a file and apply later so that is good idea to keep it in a version control. You can use –dry-run -o yaml like kubernetes in place of –yes parameter.

After in production, you can add a node if is needed… For instance, for add a single node with a role node in a cluster you can use the follow command:

kops create ig node-example 
  --role node --subnet my-subnet-name

As mentioned, you can first register the infrastructure, after that you use the –yes with kops update cluster –yes to effectively create the resource.

kops edit
As we saw, kops create cluster creates a cloud specification in the registry using cli arguments. In most cases, you will need to edit the cluster spec using kops edit before actually creating the cloud resources. As mentioned, once confirmed, you can add the –yes flag to immediately create the cluster including cloud resource. Even the resources are running, you can use kops edit any time and after apply.

kops edit cluster --state=s3://my-state-store


kops edit instancegroup --name nodes --state=s3://my-state-store

S3 state
We will see in more detais about the state in S3, but for now keep in mind that it will be used for store is the top-level config file. This file stores the main configuration for your cluster like instance types, zones, config…
kops update cluster
kops update cluster creates or updates the cloud resources to match the cluster spec. If the cluster or cloud resources already exist this command may modify those resources. As a precaution, it is safer run in ‘preview’ mode first using kops update cluster –name , and once confirmed the output matches your expectations, you can apply the changes by adding –yes to the command – kops update cluster –name –yes.

kops update cluster --yes --state=s3://my-state-store --yes
kops get clusters
kops get clusters lists all clusters in the registry (state store) one or many resources such as cluster, instancegroups and secret.

kops get -o yaml
Obviuoslly, you can get resource with or without yaml format.

kops get secrets admin -oplaintext

kops delete cluster
kops delete cluster deletes the cloud resources (instances, DNS entries, volumes, ELBs, VPCs etc) for a particular cluster. It also removes the cluster from the registry.

As a precaution, it is safer run in ‘preview’ mode first using kops delete cluster –name , and once confirmed the output matches your expectations, you can perform the actual deletion by adding –yes to the command – kops delete cluster –name –yes.

kops delete cluster --yes


kops delete instance ip-xx.xx.xx.xx.ec2.internal --yes   (delete an instance (node) from active cluster)
kops rolling-update cluster

Some changes sometimes requires to perform a rolling update. Changes like add, delete or update a node or changes that requires major changes in cluster configuration. To perform a rolling update you need to update the cloud resources first with the command kops update cluster –yes. Nodes may be additionally marked for update by placing a annotation on them.

kops rolling-update cluster (Preview a rolling update)


kops rolling-update cluster --yes 

(Nodes will be drained and the cluster will be validated between node replacement)
These commands I consider important to begin deal with kOps. There are many others commands, resources, operations and addons to be overviewed, but for now let’s focus in practice.

Provisoning a AWS Infrastructure with kOps

In this section I will show how to install kOps, install and configure AWS, configure route53 subdomain, configure, edit and delete a cluster and instances.

Install kOps

KOps can be installed aside your kubectl to manage and operate your kubernetes cluster. Via Linux you can install it as follows:

curl -Lo kops$(curl -s | grep tag_name | cut -d '"' -f 4)/kops-linux-amd64
chmod +x ./kops
sudo mv ./kops /usr/local/bin/

or install from source. If you would like to install from other OS you can get the information in official site .

Configuring the AWS CLI

To interact with AWS resoucers it is necessary to install AWSCLI and you can do it via pip:

pip install awscli

After install, you should run and configure with the follow command:

$ aws configure
AWS Access Key ID [None]: XXXXXXX
AWS Secret Access Key [None]:  XXXXXXX
Default region name [None]: us-west-2
Default output format [None]: json
Kops requires the following IAM permissions to work properly:

Creating IAM group and user kops with the required permissions:

Now you can list your users:

aws iam list-users

If you have another AWS profile in your environment you can set or change the default profile before provision our infrastrucure with kOps.

First, configure a new profile. In this case I called kops and setup my keys.

aws configure --profile kops                                                                         
AWS Secret Access Key [None]: XXXXXXXXXX
Default region name [None]: us-west-2
Default output format [None]: json
After that we can confirm the new profile in: cat ~/.aws/config

region = us-east-2
[profile kops]
region = us-west-2
output = json

So, to use the new profile we have two ways, the first is set the AWS_PROFILE variable with the name of the default profile. Another option is to set –profile option with aws command. I am going to use the first one.

❯ export AWS_PROFILE=kops

Because “aws configure” doesn’t export these vars for kops to use, we export them now

export AWS_ACCESS_KEY_ID=$(aws configure get aws_access_key_id)
export AWS_SECRET_ACCESS_KEY=$(aws configure get aws_secret_access_key)


In order to build a Kubernetes cluster with kops we need DNS records. There are some solutions that you can use here. In my case, I am going to use my own domain – – that is hosted in AWS with route53. To use this scenario, we’ve to create a new subdomain zone – – and then setting up route delegation to the new zone to kops creates the respectives records.

To create a subdomain in route53 you need to follow the steps bellow:

  • create a host zone with subdomain name ( in my case )
  • route53 will create for you NS records related with this subdomain
  • create ns records in parent domain ( using the subdomain name ( to ns record that route53 created for you in the first step. That’s OK !
    You can test with dig command:
❯ dig +short NS

Create S3 Bucket

We’ll need to create a S3 bucket to kops stores the state of our cluster. This bucket will become the source of truth for our cluster configuration.

It’s important to create the S3 Bucket with the name of your subdomain (or domain).

aws s3 mb s3://
S3 Versioning
It’s strongly versioning your S3 bucket in case you ever need to revert or recover a previous state store
aws s3api put-bucket-versioning --bucket  --version

Before start to create our cluster, let’s set somes env variables. You can export KOPS_STATE_STORE=s3://

and then kops will use this location by default. We suggest putting this in your bash profile.

export KOPS_STATE_STORE=s3://


Before provision our cluster, let’s list which availability zones are available in the specific region.

aws ec2 describe-availability-zones --region us-west-2 --output text
AVAILABILITYZONES     us-west-2       available       usw2-az2        us-west-2a 
AVAILABILITYZONES     us-west-2       available       usw2-az1        us-west-2b 
AVAILABILITYZONES     us-west-2       available       usw2-az3        us-west-2c
AVAILABILITYZONES     us-west-2       available       usw2-az4        us-west-2d

In this specific case, we can use the us-west-2a, us-west-2b, us-west-2c and us-west-2d.

Finally, let’s create our first cluster:

kops create cluster --networking calico --node-count 3 --master-count 3 --zones us-west-2a,us-west-2b,us-west-2c --master-zones us-west-2a,us-west-2b,us-west-2c

A brief explanation is needed at this point.

Once the KOPS_STATE_STORE=s3:// was set previously, it does not need to set in this command. The default networking of kOps is kubenet but is not recommended for production because it does not support network policies and other features, so we’ve to use on of these supported networks. In this case, I used calico.

To prevent the master becoming unavailable, we provided the high availability kubernetes cluster with 3 masters nodes and 3 workers nodes. Each of these nodes – master and workers – are available in all availability zones (we can define a smaller amount). If you do not define a cluster with high availability you cannot interact with the API Server during any upgrade or failing node, so you can’t add nodes, scaling pods, or replace terminated pods.

Therefore, when you define the node’s count it runs a dedicated ASG (autoscaling groups) and stores data on EBS volumes. We’ll see soon in the configuration file that we define the minimum and maximum number of the nodes. (the minimum is a quantity defined by params, –node-count 3 –master-count 3). Finally, we set the cluster name that should match with the subdomain name created previously.

There is an important option yet, the topology. If it is not set, the topology is public (our case). If it is set, the typology is private (–topology private). So, What’s the difference between them?

The public subnet is routed to an Internet gateway, the subnet is known as a public subnet. The private subnet will have public access via the Kubernetes API and an (optional) SSH bastion instance (–bastion=“true”).

After apply the configurations, see the suggestions:


  • list clusters with: kops get cluster
  • edit this cluster with: kops edit cluster
  • edit your node instance group: kops edit ig – nodes-us-west-2a
  • edit your master instance group: kops edit ig – master-us-west-2a
    According the suggestions, before apply, we can edit any configuration cluster, instance node group or instance master group. Let’s check or edit, for example, the cluster configuration:
kops edit cluster
  - cpuRequest: 200m
    - encryptedVolume: true
      instanceGroup: master-us-west-2a
      name: a
    - encryptedVolume: true
      instanceGroup: master-us-west-2b
      name: b
    calico: {}
  - cidr:
    name: us-west-2a
    type: Public
    zone: us-west-2a
      type: Public
    masters: public
    nodes: public

Note that we can change then subnet, network and cluster CIDR the resourcers requested, restrict the access to the API and so many others.

As the kops separate the configuration per availability zones, we shoud set up each of one if wanted. Supposing we want to change the type of the machine type or config the autoscalling group for the specific availabilty zone we can do it through this command:

kops edit ig – nodes-us-west-2b
Note that we are editing the nodes-us-west-2b, that is, a node instance group in a specific availabilty zone. Kops created for us one instance group for each availabilty zone defined.

kind: InstanceGroup
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  role: Node
  - us-west-2b

After review and edit any configuration you wanted, it’s time to apply these configurations to start the provisioning.

kops update cluster --name --yes --admin

kOps has set your kubectl context to

Cluster changes have been applied to the cloud.

Yes!! It’s done! As you can see, kOps has set your kubectl context to

After few minutes, you should have set a kubernete high available cluster in EC2 instances through kOps.

You can see all the pods running in the cluster and you can check now your setup with calico network is deployed.

Now we are going to deploy one more node to our cluster. It’s simple. Edit the node instance group file where you’ll want to deploy. In this case, I am going to deploy in us-west-2a. So, let’s edit and setup:

kops edit ig nodes-us-west-2a
  machineType: t3.medium
  maxSize: 2
  minSize: 2

Don’t forget to apply these changes…

❯ kops update cluster --name --yes --admin

Some changes requires rolling updates. If required you should use:

❯ kops rolling-update cluster --yes  

Now you can see that the new node was deployed in the specific availability zone and now be part of the cluster.

Usually a common bottleneck of the control plane is the API server. As the number of pods and nodes grow, you will want to add more resources to handle the load. So let’s deploy a new api server:

kops create ig new-apiserver --dry-run -o yaml > api-server.yaml

❯ cat api-server.yaml
kind: InstanceGroup
  name: new-apiserver
  machineType: t3.micro
  maxSize: 1
  minSize: 1
  role: APIServer
  - us-west-2a
  - us-west-2b

In this case, I changed the role option to APIServer, min and max size and the machineType to micro.

❯ kops create -f api-server.yaml
❯ kops update cluster --name --yes    
❯ kops rolling-update cluster --yes

In this case, how changes enolves controlplanes nodes, its requires rolling-update option. Each master node will be shutdown applying the upgrade one at a time, but our cluster does not stay unavailable because we’ve setup high available cluster.

Finally, at the end of this section, let’s delete one instance group node.

❯ kops get instancegroups #get the instancegroups
❯ kops delete ig nodes-us-west-2b
Do you really want to delete instance group "nodes-us-west-2b"? This action cannot be undone. (y/N)
InstanceGroup "nodes-us-west-2b" found for deletion
I0716 15:15:48.767476   21651 delete.go:54] Deleting "nodes-us-west-2b"

The final result is:

❯ kubectl get nodes                
NAME                                          STATUS   ROLES                             AGE     VERSION   Ready    node                              68m     v1.21.2   Ready    api-server,control-plane,master   2m54s   v1.21.2    Ready    node                              68m     v1.21.2    Ready    node                              46m     v1.21.2    Ready    api-server,control-plane,master   17m     v1.21.2    Ready    api-server,control-plane,master   9m56s   v1.21.2     Ready    api-server                        11m     v1.21.2

Deploying Prometheus and Grafana in Kubernetes Cluster
For testing purpose, I’ve deployed Prometheus and Grafana in AWS cluster. I’ve deployed a nginx ingress controller to public my app outside the cluster. I don’t cover here the deployed app either ingress config and install, but you can see the final result here:

kubectl describe ingress -n ingress-nginx     
Name:             ingress-host
Namespace:        ingress-nginx
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
  Host                               Path  Backends
  ----                               ----  --------  
                                     /   grafana:3000 (
Annotations:                         <none>
  Type    Reason  Age                From                      Message
  ----    ------  ----               ----                      -------
  Normal  Sync    15s (x3 over 10m)  nginx-ingress-controller  Scheduled for sync

The ingress exposed has the follow address:

So, i’ve to setup the CNAME DNS to this URL.

Alt Text


And that’s it. We have successfully deployed a highly available and resilient Kubernetes cluster using Kops. This post has shown how to manage a Kubernetes cluster on AWS using kops. I guess Kops is an awesome tool to running a kubernetes production-grade cluster in AWS or others cloud providers. Try it !!