Blog Icon

Blog Post

How to Monitor Kubernetes API Server

NEW!! LIVE WEBINAR: Ship Apps Faster on AWS with Unified Visibility and Security - Oct 8, 2020 10am Pacific / 1pm Eastern

Learning how to monitor Kubernetes API server is of vital importance when running Kubernetes in production. Monitoring kube-apiserver will let you detect and troubleshoot latency, errors and validate the service performs as expected. Keep reading to learn how you can collect the most important metrics from the kube-apiserver and use them to monitor this service.

The Kubernetes API server is a foundational component of the Kubernetes control plane. All of the services running inside the cluster use this interface to communicate between each other. The entirety of user interaction is handled through the API as well: kubectl is a wrapper to send requests to the API. While kubectl uses HTTP to connect to the API server, the rest of the control plane components use gRPC. We should be ready to monitor both channels.

kubernetes control plane

Like with any other microservice, we are going to take the Golden Signals approach to monitor the Kubernetes API server health and performance:

  • Latency
  • Request rate
  • Errors
  • Saturation

But before we dive into the meaning of each one, let’s see how to fetch those metrics.

Getting the metrics to monitor kube-apiserver

API server has been instrumented and it exposes Prometheus metrics by default, providing monitoring metrics like latency, requests, errors and etcd cache status. This endpoint can be easily scraped, obtaining useful information without the need of additional scripts or exporters.

The API server requires authentication to make a request to /metrics endpoint, so you need to get credentials with privileges for that. If you are running Prometheus inside the cluster, we can authenticate using a service account, bound to a ClusterRole, granting GET requests to /metrics endpoint.

This can be done by adding one rule to the ClusterRole used by Prometheus:

This way, we can access /metrics endpoint using the bearer token from the service account, present in the pod, in /var/run/secrets/

We can test the authentication by executing this shell command from within the Prometheus pod :

#curl  https://kubernetes.default.svc/metrics -H "Authorization: Bearer $(cat /var/run/secrets/" --cacert /var/run/secrets/

It will return a long list of Prometheus metrics (truncated here):

# TYPE APIServiceOpenAPIAggregationControllerQueue1_adds counter
APIServiceOpenAPIAggregationControllerQueue1_adds 108089
# HELP APIServiceOpenAPIAggregationControllerQueue1_depth Current depth of workqueue: APIServiceOpenAPIAggregationControllerQueue1
# TYPE APIServiceOpenAPIAggregationControllerQueue1_depth gauge
APIServiceOpenAPIAggregationControllerQueue1_depth 0
# HELP APIServiceOpenAPIAggregationControllerQueue1_queue_latency How long an item stays in workqueueAPIServiceOpenAPIAggregationControllerQueue1 before being requested.
# TYPE APIServiceOpenAPIAggregationControllerQueue1_queue_latency summary
APIServiceOpenAPIAggregationControllerQueue1_queue_latency{quantile="0.5"} 15

Configuring Prometheus to scrape the Kubernetes API server endpoint can be done by adding one job to your targets:

Monitor Kubernetes API server: What to look for?

We can use Golden Signals to monitor Kubernetes API server. Golden Signals is a technique used to monitor a service through a number of metrics that give insights on how it’s performing for the consumers (here they are kubectl users and the internal cluster components). These metrics are latency, requests, errors and saturation (how busy the server is towards its maximum capacity with current resources).

Disclaimer: API server metrics might differ between Kubernetes versions. Here we used Kubernetes 1.15. You can check the metrics available for your version in the Kubernetes repo (link for the 1.15.3 version).

Latency: Latency can be extracted from the apiserver_request_duration_seconds histogram buckets:

# TYPE apiserver_request_latencies histogram
apiserver_request_duration_seconds{resource="adapters",scope="cluster",subresource="",verb="LIST",le="125000"} 2
apiserver_request_duration_seconds{resource="adapters",scope="cluster",subresource="",verb="LIST",le="250000"} 2
apiserver_request_duration_seconds{resource="adapters",scope="cluster",subresource="",verb="LIST",le="500000"} 2
apiserver_request_duration_seconds{resource="adapters",scope="cluster",subresource="",verb="LIST",le="1e+06"} 2
apiserver_request_duration_seconds{resource="adapters",scope="cluster",subresource="",verb="LIST",le="2e+06"} 2
apiserver_request_duration_seconds{resource="adapters",scope="cluster",subresource="",verb="LIST",le="4e+06"} 2
apiserver_request_duration_seconds{resource="adapters",scope="cluster",subresource="",verb="LIST",le="8e+06"} 2
apiserver_request_duration_seconds{resource="adapters",scope="cluster",subresource="",verb="LIST",le="+Inf"} 2
apiserver_request_duration_seconds_sum{resource="adapters",scope="cluster",subresource="",verb="LIST"} 50270
apiserver_request_duration_seconds_count{resource="adapters",scope="cluster",subresource="",verb="LIST"} 2

It’s a good idea to use percentiles to understand the latency spread:

histogram_quantile(0.99, sum(rate(apiserver_request_latencies_count{job="kubernetes-apiservers"}[5m])) by (verb, le))

Request rate: The metric apiserver_request_total can be used to monitor the requests to the service, from where they are coming, to which service, which action and whether they were successful:

# TYPE apiserver_request_count counter
apiserver_request_total{client="Go-http-client/1.1",code="0",contentType="",resource="pods",scope="namespace",subresource="portforward",verb="CONNECT"} 4
apiserver_request_total{client="Go-http-client/2.0",code="200",contentType="application/json",resource="alertmanagers",scope="cluster",subresource="",verb="LIST"} 1
apiserver_request_total{client="Go-http-client/2.0",code="200",contentType="application/json",resource="alertmanagers",scope="cluster",subresource="",verb="WATCH"} 72082
apiserver_request_total{client="Go-http-client/2.0",code="200",contentType="application/json",resource="clusterinformations",scope="cluster",subresource="",verb="LIST"} 1

For example, you can get all the successful requests across the service like this:


Errors: You can use the same query used for request rate, but filter for 400 and 500 error codes:


Saturation: We can monitor saturation through system resource consumption metrics like CPU, memory and network I/O for this service.

In addition to API server related metrics, we can access other relevant metrics. API server offers:

  • From controller-manager:
    • work queue addition rate: How fast we are scheduling new actions to perform by controller. These actions can include additions, deletions and modifications of any resource in the cluster (workloads, configmaps, services…).
    • **work queue latency: **How fast is the controller-manager performing these actions?
    • **work queue depth: **How many actions are waiting to be executed?
  • From** etcd:**
    • **etcd cache entries: **How many query results have been cached?
    • **etcd cache hit/miss rate: **Is cache being useful?
    • **etcd cache duration: **How long are the cache results stored?

Examples of issues

You detect an increase of latency in the requests to the API.

This is typically a sign of overload in the API server. Probably your cluster has a lot of load and the API server needs to be scaled out.

You can segment the metrics by type of request, by resource or verb. This way you can detect where the problem is. Maybe you are having issues reading or writing to etcd and need to fix it.

You detect an increase in the depth and the latency of the work queue.

You are having issues scheduling actions. You should check that the scheduler is working. Maybe some of your nodes are overloaded and you need to scale out your cluster. Maybe one node is having issues and you want to replace it.

Monitoring Kubernetes API server metrics in Sysdig Monitor

If you want to monitor Kubernetes API server using Sysdig Monitor, you just need to add a couple of sections to the Sysdig agent yaml configuration file:

With the metrics_filter part, you ensure that these metrics won’t be discarded if you hit the metrics limit. You can add any other metric offered by the API server that is not in this list, like this:

    - include: "apiserver_request_total"
    - include: "go_goroutines"

Then, you configure how the Sysdig agent will scrape the metrics, searching the Kubernetes pods that have the label kube-apiserver and scraping in localhost through port 8080. As the Sysdig agent is capable of switching network context and connecting to the pod as it was at localhost, we don’t need to use https. You also must specify the authentication token to be used by the agent to access the metrics endpoint:

    enabled: true
      - include:
          kubernetes.pod.label.k8s-app: kube-apiserver
          port: 8080
            port: 8080
            use_https: false
            auth_token_path: "/var/run/secrets/"

You can then build custom dashboards using these metrics. We have some pre-built dashboards that we can share with you if you are interested.

Kubernetes API server sysdig dashboard


Monitoring Kubernetes API server is fundamental as it is a key piece in the cluster operation. Remember, all the communication between the cluster components is done via kube-apiserver.

Detecting issues in the API server can be a key factor for fixing issues in your Kubernetes clusters. You should be aware of what is going on in your control plane components and learn how to leverage that in your favour when problems come, and they will.

A good monitoring of all the Kubernetes components can be as important as monitoring your workloads and applications running inside the cluster. Don’t forget to monitor your control plane!

Stay up to date

Sign up to receive our newest.

Related Posts

How to monitor Golden signals in Kubernetes

How to Monitor etcd on Kubernetes

Introducing kube-state-metrics support — complete Kubernetes state monitoring