Kubernetes ReplicaSets overview

SHARE:

Facebook logo LinkedIn logo X (formerly Twitter) logo

What is a Kubernetes ReplicaSet?

A ReplicaSet (RS) is a Kubernetes object used to maintain a stable set of replicated pods running within a cluster at any given time. A Kubernetes pod is a cluster deployment unit that typically contains one or more containers. Pods (and, by extension, containers) are, nevertheless, short-lived entities. Starting in the Pending phase, pods progress to the Running phase if at least one of their primary containers begins successfully. A container hosting a sample application, for example, could fail. As a host, you’d think the kubelet process would automatically recreate it, however pods aren’t automatically rescheduled when they die. This has an impact on the containers within it because they are not rescheduled and may be discarded owing to a lack of resources. To keep your application operating, you’ll need to keep track of the health of your pods.

The Kubernetes replication controller is in charge of ensuring that pod lifecycles are maintained by scheduling a replacement for any pods that fail. There are several distinct controllers, each having a different use case. They all have one thing in common: they all keep a watch on a specific cluster of pods to make sure the proper number of them are always running. In this article, we’ll focus on Replicasets (RS) as one of the key controllers and compare it to the others.

So what is a ReplicaSet ?

A ReplicaSet (RS) is a Kubernetes object used to maintain a stable set of replicated pods running within a cluster at any given time. 

As stated above, the main goal of kubernetes controllers is to define a desired state, by ensuring at any given moment, N pods of the same type are constantly running at all times. As such, RS are often used to guarantee the availability of a service. Kubernetes will automatically deploy extra pods to replace those that fail or become inaccessible, preventing users from losing access to an application. If we didn’t use ReplicaSets, we would have to build as many manifests as the number of pods we require, which would be a lot of work to do for each application. 

A ReplicaSet has two main features: a pod template for creating new pods whenever existing ones fail, and a replica count for maintaining the desired number of replicas that the controller is supposed to keep running. A ReplicaSet also works to ensure additional pods are scaled down or deleted whenever an instance with the same label is created.

As a result, it ensures that a specified number of replica pods are running continuously. A Kubernetes RS helps with load-balancing, reliability, and scaling, as follows:

  • Load balancing: Replication allows Kubernetes to have multiple instances of a pod. This means that traffic is sent to different instances, which prevents a single instance from being overloaded.
  • Reliability: Replication ensures that we have multiple instances of an application, which means that it won’t fail just because one of the containers fails.
  • Scaling: With Kubernetes, you can quickly scale your application up or down by adding or removing instances.

ReplicaSet configuration

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: my-replicaset
  labels:
	my-label: my-value
spec:
  replicas: 3
  selector:
	matchLabels:
  	  my-label: my-value
  template:
	metadata:
  	  labels:
    	    my-label: my-value
	spec:
  	  containers:
    	    - name: app-container
          image: my-image:latest

The following fields are used to define a ReplicaSet:

  • Selector: Used to identify which pods this ReplicaSet is responsible for.
  • Replicas: Indicates the number of pods the ReplicaSet should maintain. 
  • Template: Defines the pod template that the ReplicaSet uses when creating pods and adding them to meet the required number of replicas.
  • ApiVersion: Defines the Kubernetes API that supports the resource ReplicaSet. Every API supports certain resources and together with Kind we can unequivocally define them.
  • Kind: Defines the resource as a ReplicaSet for the Kubernetes API

The above RS manifest file uses a ReplicaSet to run three copies of the my-image: latest container image. By adjusting the value in the manifest and re-applying it, you can alter the number of copies (kubectl apply -f my-manifest.yml).

ReplicaSets compared to Deployments

Deployments are an alternative to ReplicaSets,  as they are used to manage ReplicaSets . They are handy when it comes to rolling out changes to a set of pods via a ReplicaSet. You can simply roll back to a previous Deployment revision when managing a ReplicaSet using a Deployment. You can also use a Deployment to create a new revision of a ReplicaSet and then migrate existing pods from an older revision into the new revision. After that, the Deployment can take care of cleaning up old, unused ReplicaSets.

In terms of usage, a ReplicaSet is in charge of ensuring that pods are available, whereas a Deployment is responsible for managing different versions of an application by controlling one or more ReplicaSets. A Deployment is a higher abstraction that controls the release of a new version by managing one or more ReplicaSets. ReplicaSets can still manage pods and scale instances of specific pods, but they can’t make rolling updates or use several other features. Instead, this functionality is managed by a Deployment, which is the resource that a Kubernetes user would most likely interface with today.

Deployment configuration

Deployments, like ReplicaSets, are Kubernetes API objects that need the apiVersion, kind, and metadata defined. The example of an nginx-deployment.yaml provided in the Kubernetes documentation is a fantastic way to show the basic functionality of Deployments:

apiVersion: apps/v1
kind: Deployment
metadata:
 name: nginx-deployment
  labels:
     app: nginx
spec: (1)
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec: (2)
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80

In the example above, we have a Deployment named nginx-deployment. First of all, observe how its declaration is almost identical to the declaration of a ReplicaSet. The contents of the ReplicaSet will actually be copied from this declaration.

Now, notice that it needs three NGINX pods to reach the desired state. To accomplish this, it creates a ReplicaSet containing all of the information required to scale our NGINX application to three instances. 

It then gives us a declarative interface through which we can control the ReplicaSet and pods. Its selector (inside the first spec subresource) detects which pods in the cluster are managed by this Deployment by matching the label app: nginx (declared inside the second spec subresource). It also refers to the container nginx, which is declared in the Deployment’s pod template. See the upstream documentation for more information on Deployments.

ReplicaSet vs. ReplicationController

Kubernetes ReplicaSets have replaced the older Kubernetes ReplicationControllers. While they serve the same purpose and operate identically, there are a few key differences. One is that ReplicationControllers did not allow for set-based selector criteria, which is why ReplicaSets have come to replace them. Another key distinction is that ReplicaSets allow us to use a Kubernetes feature called the “label selector” to identify a set of items.

The “rolling-update” functionality was one of the most important features of ReplicationControllers. This functionality allowed ReplicationControllers to update the pods they supervised with minimal/no downtime to the service that those pods provided. To accomplish this, the old instances of the pods were upgraded one-by-one. Still, ReplicationControllers were seen as rigid and inflexible, which is why ReplicaSets and Deployments were introduced to replace them.

The following example of a ReplicationController config YAML from the official documentation runs three copies of the NGINX web server:

apiVersion: v1
kind: ReplicationController
metadata:
  name: nginx
spec:
  replicas: 3
  selector:
	app: nginx
  template:
	metadata:
  	  name: nginx
  	  labels:
    	    app: nginx
	spec:
  	  containers:
  	  - name: nginx
    	    image: nginx
    	    ports:
    	    - containerPort: 80

ReplicaSet vs. StatefulSet

StatefulSets can be used to create pods with a guaranteed start-up order and unique identifiers. These can be used to make sure that a pod maintains its identity throughout the StatefulSet’s lifecycle. Importantly, this feature enables stateful apps to function within Kubernetes because it ensures common persistent components like storage and networking. Furthermore, pods are always formed in the same order and assigned identifiers that are applied to hostnames and the internal cluster DNS when they are created. Those identifiers ensure that pods in the environment have stable and predictable network identities.

A StatefulSet  is useful in guaranteeing order and uniqueness of pods, by keeping a persistent identity of each pod using unique naming conventions.  These pods are made to identical specifications, but they are not interchangeable; rather, each one has a unique identity that it keeps throughout any rescheduling. Each pod is assigned a DNS name based on the following convention: <statefulset name>-<ordinal index>.

PersistentVolumeClaim templates are a key feature of StatefulSets, as each replica has its own volume and state, ensuring persistence in statefulsets. Each pod generates its own PVC (Persistent Volume Claim). As a result, this powerful facet, comes in handy when replicating large datasets, such as the contents of highly available databases. 

StatefulSets have many use cases for applications that require one or more of the following:

  • Network identities that are both stable and unique.
  • Storage that is both stable and persistent.
  • Scaling and deployment that is orderly and seamless.
  • Rolling updates that are ordered and automated.

For more information on StatefulSets, see the upstream documentation

ReplicaSet vs. DaemonSet

DaemonSets are a key component of the Kubernetes cluster and allow administrators to configure services (pods) across all or a subset of Kubernetes nodes with ease. They ensure that a pod is replicated on some or all of the nodes. When a new node is added to a Kubernetes cluster, a new pod will be created on that node.Therefore, there is no need to indicate the number of replicas a DaemonSet should deploy.

 The DaemonSet controller guarantees that garbage collection is performed on pods associated with nodes when they are deleted. When you delete a DaemonSet, you will delete all of the pods that it has produced. DaemonSets are frequently used to run cluster-level applications; for example, they are used for tasks including:

  • Installing a cluster storage daemon (such as glusterd or Ceph) on each node.
  • Using a log collecting daemon (such as Fluentd or Logstash) on each node.
  • Running a node monitoring daemon (like Prometheus Node Exporter, collectd, or Datadog Agent) on each node.

Like a ReplicaSet, a DaemonSet is a controller that ensures that the pod operates on all nodes in the cluster. A DaemonSet also has similar functionality when it comes to automatically adding/deleting pods when a node is added/removed from a cluster. You should use a DaemonSet instead of a ReplicaSet for pods that perform machine-level functions (like machine monitoring or machine logging). The lifecycle of these pods is related to the lifecycle of the machine – they must be operating on the machine before other pods can start, and they are safe to terminate when the machine is ready to be rebooted or shut down.

A YAML file can be used to describe a DaemonSet. For example, the following file (daemonset.yaml) describes a DaemonSet that runs the fluentd-elasticsearch Docker image:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-elasticsearch
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
spec:
  selector:
    matchLabels:
      name: fluentd-elasticsearch
  template:
    metadata:
      labels:
        name: fluentd-elasticsearch
    spec:
      tolerations:
  	- key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      containers:
  	- name: fluentd-elasticsearch
        image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
  	- name: varlog
        hostPath:
          path: /var/log
  	- name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containersCode language: JavaScript (javascript)

Conclusion

In this post, we showed you how to use Kubernetes ReplicaSets to automate application lifecycle management to increase productivity. We also discussed various alternative resources – including Deployments, StatefulSets, and DaemonSets – and explained their use cases. In order to make full use of this knowledge in your Kubernetes clusters, you must evaluate the needs of your pods so that you can select the optimum resource with the best functionality for your use case.