How to Monitor kube-controller-manager

Published by:

Published:

December 15, 2022

Table of contents

Text Link

When it comes to creating new Pods from a ReplicationController or ReplicaSet, ServiceAccounts for namespaces, or even new EndPoints for a Service, kube-controller-manager is the one responsible for carrying out these tasks. Monitoring the Kubernetes controller manager is fundamental to ensure the proper operation of your Kubernetes cluster.

If you are in your cloud-native journey, running your workloads on top of Kubernetes, don’t miss the kube-controller-manager observability. In the event of facing issues with the Kubernetes controller manager, no new Pods (among a lot of different objects) will be created. That’s why monitoring Kubernetes controller manager is so important!

If you are interested in knowing more about monitoring Kubernetes controller manager with Prometheus, and what the most important metrics to check are, keep reading!

What is kube-controller-manager?

The Kubernetes controller manager is a component of the control plane, running in the form of a container within a Pod, on every master node. You’ll find its definition in every master node in the following path: /etc/kubernetes/manifests/kube-controller-manager.yaml.

Kube-controller-manager is a collection of different Kubernetes controllers, all of them included in a binary and running permanently in a loop. Its main task is to watch for changes in the state of the objects, and make sure that the actual state converges towards the new desired state. In summary, it is responsible for the reconciling tasks around the state of Kubernetes objects.

Controllers running in the kube-controller-manager use a watch mechanism to get notifications every time there are changes on resources. Each controller acts accordingly to what is required from this notification (create/delete/update).

There are multiple controllers running in the kube-controller-manager. Each one of them is responsible for reconciling its own kind of Object. Let’s talk about some of them:

ReplicaSet controller: This controller watches the desired number of replicas for a ReplicaSet and compares this number with the Pods matching its Pod selector. If the controller is informed via the watching mechanism of changes on the desired number of replicas, it acts accordingly, via the Kubernetes API. If the controller needs to create a new Pod because the actual number of replicas is lower than desired, it creates the new Pod manifests and posts them to the API server.
Deployment controller: It takes care of keeping the actual Deployment state in sync with the desired state. When there is a change in a Deployment, this controller performs a rollout of a new version. As a consequence, a new ReplicaSet is created, scaling up the new Pods and scaling down the old ones. How this is performed depends on the strategy specified in the Deployment.
Namespace controller: When a Namespace is deleted, all the objects belonging to it must be deleted. The Namespace controller is responsible for completing these deletion tasks.
ServiceAccount controller: Every time a namespace is created, the ServiceAccount controller ensures a default ServiceAccount is created for that namespace. Along with this controller, a Token controller is run at the same time, acting asynchronously and watching ServiceAccount creation and deletion to create or delete its corresponding token. It applies for ServiceAccount secret creation and deletion.
Endpoint controller: This controller is responsible for updating and maintaining the list of Endpoints in a Kubernetes cluster. It watches both Services and Pod resources. When Services or Pods are added, deleted, or updated, it selects the Pods matching the Service Pod criteria (selector) and their IPs and ports to the Endpoint object. When a Service is deleted, the controller deletes the dependent Endpoints of that service.
PersistentVolume controller: When a user creates a PersistentVolumeClaim (PVC), Kubernetes must find an appropriate Persistent Volume to satisfy this request and bind it to this claim. When the PVC is deleted, the volume is unbound and reclaimed according to its reclaim policy. The PersistentVolume controller is responsible for such tasks.

In summary, kube-controller-manager and its controllers reconcile the actual state with the desired state, writing the new actual state to the resources’ status section. Controllers don’t talk to each other, they always talk directly to the Kubernetes API server.

How to monitor Kubernetes controller manager

Kube-controller-manager is instrumented and provides its own metrics endpoint by default, no extra action is required. The exposed port for accessing the metrics endpoint is 10257 for every kube-controller-manager Pod running in your Kubernetes cluster.

In this section, you’ll find some easy steps you need to follow to get access directly to the metrics endpoint manually, and how to scrape metrics from a Prometheus instance.

Note: If you deployed your Kubernetes cluster with kubeadm using the default values, you may have difficulties reaching the 10257 port and scraping metrics from Prometheus. Kubeadm sets the kube-controller-manager bind-address to 127.0.0.1, so only Pods in the host network could reach the metrics endpoint: https://127.0.0.1:10257/metrics.

Getting access to the endpoint manually

As discussed earlier, depending on how your Kubernetes cluster was deployed, you can face issues while accessing the kube-controller-manager 10257 port. For that reason, getting access to the kube-controller-manager metrics by hand is only possible when either the controller Pod is started with –bind-address=0.0.0.0, or by accessing from the master node itself or a Pod in the host network if bind-address is 127.0.0.1.

You can run the curl command from a Pod using a ServiceAccount token with enough permissions:

Or run the curl command from a master node, using the appropriate certificates to pass the authentication process:

How to configure Prometheus to scrape kube-controller-manager metrics

When it comes to scraping kube-controller-manager metrics, it is mandatory for the kube-controller-manager to listen on 0.0.0.0. Otherwise, the Prometheus Pod or any external Prometheus service won’t be able to reach the metrics endpoint.

In order to scrape metrics from the kube-controller-manager, let’s rely on the kubernetes_sd_config Pod role. Keep in mind what we discussed earlier: the metrics endpoint should be accessible from any Pod in the Kubernetes cluster. You only need to configure the appropriate job in the prometheus.yml config file.

This is the default job included out of the box with the Community Prometheus Helm Chart.

In addition, to make these Pods available for scraping, you’ll need to add the following annotations to the /etc/kubernetes/manifests/kube-controller-manager.yaml file. After editing these manifests in every master, a new kube-controller-manager Pods will be created.

Monitoring the kube-controller-manager: Which metrics should you check?

At this point, you have learned what the kube-controller-manager is and why it is so important to monitor this component in your infrastructure. You have already seen how to configure Prometheus to monitor Kubernetes controller manager. So, the question now is:

Which are the kube-controller-manager metrics should you monitor?

Let’s cover this topic right now. Keep reading!

Disclaimer: kube-controller-manager server metrics might differ between Kubernetes versions. Here, we used Kubernetes 1.25. You can check the metrics available for your version in the Kubernetes repo.

workqueue_queue_duration_seconds_bucket: The time that kube-controller-manager is taking to fulfill the different actions to keep the desired status of the cluster.

A good way to represent this is using quantiles. In the following example, you can check the 99th percentile of the time the kube-controller-manager needed to process the items in the workqueue.

‍workqueue_adds_total: This metric measures the number of additions handled by the workqueue. A high value might indicate problems in the cluster, or in some of the nodes.

You may want to check the rate of additions to the kube-controller-manager workqueue. Run the following query to check the additions rate.

workqueue_depth: This metric enables you to verify how big the workqueue is. How many actions in the workqueue are waiting to be processed? It should remain a low value. The following query will allow you to easily see the increase rate in the kube-controller-manager queue. The bigger the workqueue is, the more has to process. Thus, a workqueue growing trend may indicate problems in your Kubernetes cluster.

rest_client_request_duration_seconds_bucket: This metric measures the latency or duration in seconds for calls to the API server. It is a good way to monitor the communications between the kube-controller-manager and the API, and check whether these requests are being responded to within the expected time.

Use this query if you want to calculate the 99th percentile of latencies on requests to the Kubernetes API server.

rest_client_requests_total: This metric provides the number of HTTP client requests for kube-controller-manager by HTTP response code.

If you want to get the rate for HTTP 2xx client requests, run the following query. It will report the rate of HTTP successful requests.

For HTTP 3xx client requests rate, use the following query. It will provide the rate on the number of HTTP redirection requests.

The following query shows you the rate for client errors HTTP requests. Monitor this thoroughly to detect any client error response.

Lastly, if you want to monitor the server errors HTTP requests, use the following query.

process_cpu_seconds_total: The total CPU time spent in seconds for kube-controller-manager by instance.

You can get the spending time for CPU rate by running this query.

process_resident_memory_bytes: This metric measures the amount of resident memory size in bytes for kube-controller-manager by instance.

Easily monitor the kube-controller-manager resident memory size with this query.

Conclusion

In this article, you have learned that the Kubernetes controller manager is responsible for reaching the desired state of Kubernetes objects by communicating with the Kubernetes API via a watch mechanism. This internal component, an important piece within the Kubernetes control plane, is key to monitor kube-controller-manager for preventing any issues that may come up.

Monitor kube-controller-manager and troubleshoot issues up to 10x faster

Sysdig can help you monitor and troubleshoot problems with kube-controller-manager and other parts of the Kubernetes control plane with the out-of-the-box dashboards included in Sysdig Monitor. Advisor, a tool integrated in Sysdig Monitor, accelerates troubleshooting of your Kubernetes clusters and its workloads by up to 10x.

‍

About the author

No items found.

featured resources

Test drive the right way to defend the cloud with a security expert

GET A DEMO