The power of labels in Kubernetes

Posted at 2019-12-18

Hello, my name is Adam and I am part of the infrastructure team here in Linkbal. Recently I have been working on building a Kubernetes infrastructure for one of our services, as well as taking part in the weekly Kubernetes 勉強会 on Mondays. Today I would like to discuss one the most powerful aspects of Kubernetes: labels. What are they, why use them and their usage with Selectors.

What is a label?

A label is simply that, an arbitrary key/value pair which is attached to a Kubernetes resource. They can then be utilized when selecting resources using label Selectors. A resource can have more than one label, but each label key must be unique. Usually labels are attached on resource creation, but more can be added or changed later.

OK, but why bother?

Consider a microservices architecture with 10s of pods running simultaneously. How would you differentiate between the pods? Pod names must be unique, so you could name them frontend-1, backend-3, website-stg or similar. But this can very quickly get out of hand; what happens if there are two development-level pods called frontend-X-dev, but due to different developers working on them, now have differing contents? It would be very easy to forget which pod is which! On top of that, what if you wanted to make a change to all staging level pods? You would have to specify them by pod name one at a time; in a large microservice architecture, this would be very time-consuming and prone to mistakes.

This is where labels come in. By attaching labels to pods, you can easily differentiate between them and even specify groups of pods with matching labels.

Example pods

Below is an example of how labels can be attached to pods on creation. Labels are a part of the 'metadata' section.

apiVersion: v1
kind: Pod
metadata:
  name: my-app-1
  labels:
    team: infra
    env: dev
spec:
  containers:
....{Container details}....

Consider pods with the following labels:

Name	team	env
my-app-1	dev	dev
my-app-2	infra	dev
my-app-3	dev	stg
my-app-4	infra	prod

Here are some basic commands that make use of the labels attached to the pods:

# Show all pods with all labels
kubectl get pods --show-labels
# List only pods with env=dev
kubectl get pods -l env=dev
# List only pods with team=infra and env=dev
kubectl get pods -l team=infra,env=dev
# Delete all pods without env=prod
kubectl delete pods -l env!=prod

Selectors

The set of pods that a Service targets or a ReplicationController manages are defined by label selectors, which are equality-based. Newer resources like ReplicaSets, Jobs and Deployments support set-based selectors as well. If multiple labels are used, all must be matched for the selector to target the resources.

# Basic selector.
selector:
    env: dev
---
# Newer matchLabels selector. Same as above.
selector:
  matchLabels:
    env: dev
---
# Set-based matchExpressions selector.
# Possible operators: In, NotIn, Exists and DoesNotExist.
selector:
  matchExpressions:
  - key: env
    operator: In
    values:
      dev

Let us consider an example scenario. Say you want to expand the development testing of my-app-2, by provisioning up to 10 pods in total for your infrastructure team. You could copy and paste the creation file for the pod, specify new names (remember: pods can not have the same name) then create them all. This is hardly efficient and results in multiple copies of basically the same file. Alternatively, you can create a ReplicaSet, specify a matchLabels selector and set replicas to 10. The ReplicaSet will see the current pod my-app-2, count it towards its desired pod count, then create new pods with names with randomized suffixes.

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: my-app-2
spec:
  replicas: 10
  selector:
    matchLabels:
      team: infra
      env: dev
  template:
    metadata:
      labels:
        team: infra
        env: dev
    spec:
      containers:
      - image: repository/image
        name: my_container

With only a ReplicaSet and a matchLabels selector, you can now easily control the number of pods that you want for my-app-2. You could even change the image or labels if you so wished.

Now that you have provisioned 10 pods for the infrastructure team, let's say you want to provide a Service to one of them which exposes the pods to an endpoint. Simply attach a new label to the pod you want to expose, then create a load balancer Service with a Selector that contains the same labels.

# Label the pod
kubectl label pod my-app-2 expose=load-balancer

Create the load balancer Service targeting the pod:

apiVersion: v1
kind: Service
metadata:
  name: my-app-2-svc
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: {container port}
  selector:
    team: infrastructure
    expose: public
    env: dev

Now you are serving only the pod with labels that match all three labels in the Service Selector. No need to look up the IP address or anything, everything was done simply by specifying labels.

Investigating a pod

Using a ReplicaSet with a selector gives another advantage. Consider the situation where one of the pods has been behaving strangely and you wish to investigate inside the container. If this were a production level environment, it would be a bad idea to stop and start any processes. But with the power of labels, you don't have to worry! Simply change one of the labels listed in the selector, and the ReplicaSet will no longer manage it. In fact, as the number of pods has dropped by 1 compared to its desired count, it will immediately provision a new pod to replace it! You can then freely enter the pods containers and run whatever commands you wish, without any fear of affecting the main service. On a side note, if these pods were being served by a Service such as load balancer, the Service would also have a label Selector component. So because you changed the value of one of the labels, traffic will no longer be routed to it.

# Change the value of the label env to bug
kubectl label pod my-app-2 env=bug --overwrite
# The ReplicaSet will then spin up a new pod.
# Start a bash session in the pod container
kubectl exec -it my-app-2 bash

Conclusion

This has been a short explanation on the power and uses of labels in Kubernetes. If there is one thing you should take away from this, it's that you should label extensively in Kubernetes. They are not only useful, but necessary to a comprehensive Kubernetes infrastructure. If you start a new Kubernetes project, it would be wise to sit down with your team and decide a labeling scheme, to keep consistency between team members and to make everything easier to understand, even for those outside of the team.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up