How to monitor etcd

By David Lorite Solanas - AUGUST 6, 2020
How to monitor etcd
BACK TO blog

Learning how to monitor etcd is of vital importance when running Kubernetes in production. Monitoring etcd will let you validate that the service performs as expected, while detecting and troubleshooting issues that could take your entire infrastructure down. Keep reading to learn how you can collect the most important metrics from etcd and use them to monitor this service. etcd is a foundational component of the Kubernetes control plane. It stores your cluster desired state (pods, secrets, deployments, etc.), among other things. If this service isn’t running, you won’t be able to deploy anything and the cluster can’t self-heal.

What is etcd?

The motivation of etcd is to provide a distributed, key-value, dynamic database that maintains a “configuration registry.” This registry is one of the foundations of a Kubernetes cluster service directory, peer discovery, and centralized configuration management. It bears a certain resemblance to a Redis database, classical LDAP configuration backends, or even the Windows Registry, if you are more familiar with those technologies. Etcd is part of Kubernetes control plane. It sits on the master node with the API server, the controller manager, the scheduler, kube-dns, and kubelet. Other nodes communicate via the API server, and contain services like kubelet, kube-proxy, and the container runtime. According to its developers, etcd aims to be:
  • Simple: well-defined, user-facing API (JSON and gRPC)
  • Secure: automatic TLS with optional client cert authentication
  • Fast: benchmarked 10,000 writes/sec
  • Reliable: properly distributed using Raft
Kubernetes uses the etcd distributed database to store its REST API objects (under the /registry directory key): pods, secrets, daemonsets, deployments, namespaces, events, etc. Raft is a “consensus” algorithm, a method to achieve value convergence over a distributed and fault-tolerant set of cluster nodes. Without going into the gory details that you will find in the referenced articles, here are the basics of what you need to know:
  • Node status can be one of: Follower, Candidate (briefly), Leader
  • If a Follower cannot locate the current Leader, it will become Candidate
  • The voting system will elect a new Leader amongst the Candidates
  • Registry value updates (commits) always go through the Leader
  • Once the Leader has received the ack from the majority of Followers the new value is considered “committed”
  • The cluster will survive as long as most of the nodes remain alive
  • Perhaps the most remarkable features of etcd are the straightforward ways of accessing the service using REST-like HTTP calls. It makes integrating third-party agents as simple as you can get, and its master-master protocol automatically elects the cluster Leader and provides a fallback mechanism to switch this role if needed.

etcd cluster common points of failure

Most of the time, your etcd cluster works so neatly that it’s easy to forget its nodes are running. Keep in mind, however, that Kubernetes absolutely needs this registry to function, and a major etcd failure will seriously cripple or even take down your container infrastructure. Pods currently running will continue to run, but you cannot make any further operations. When you re-connect etcd and Kubernetes again, state incoherences could cause additional malfunction.

How to monitor etcd?

You can run etcd in Kubernetes, inside Docker containers, or as an independent cluster (in virtual machines or directly bare-metal). Usually, for simple scenarios, etcd is deployed in a Docker container like other Kubernetes services, such as the API server, controller-manager, scheduler, or kubelet. On more advanced scenarios, etcd is often an external service; in these cases, you will normally see three or more nodes to achieve the required redundancy.

Getting metrics from etcd

Etcd has been instrumented and it exposes Prometheus metrics by default in the port 4001 of the master host, providing information of the storage. This endpoint can be easily scraped, obtaining useful information without the need for additional scripts or exporters. You can’t scrape etcd metrics accessing the port in the node directly without authentication. The etcd is the core of any Kubernetes cluster, so its metrics are securitized too. To get the metrics, you need to have access to the port 4001 or be in the master itself, and you need to have the client certificates as well. If you have access to the master node, just do a curl from there with the client certificate paths; the certificate is in: /etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.crt and the key /etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.key.
curl https://localhost:4001/metrics -k --cert /etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.crt --key /etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.key
If you want to connect from outside of the master node, have the certificates from the master node, and have the port 4001 open, then you can access with the IP as well.
curl https://[master_ip]:4001/metrics -k --cert /etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.crt --key /etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.key
It will return a long list of metrics with this structure (truncated):
# HELP etcd_disk_backend_snapshot_duration_seconds The latency distribution of backend snapshots.
# TYPE etcd_disk_backend_snapshot_duration_seconds histogram
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.01"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.02"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.04"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.08"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.16"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.32"} 3286
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.64"} 4617
etcd_disk_backend_snapshot_duration_seconds_bucket{le="1.28"} 4620
etcd_disk_backend_snapshot_duration_seconds_bucket{le="2.56"} 4620
etcd_disk_backend_snapshot_duration_seconds_bucket{le="5.12"} 4620
etcd_disk_backend_snapshot_duration_seconds_bucket{le="10.24"} 4620
etcd_disk_backend_snapshot_duration_seconds_bucket{le="20.48"} 4620
etcd_disk_backend_snapshot_duration_seconds_bucket{le="40.96"} 4620
etcd_disk_backend_snapshot_duration_seconds_bucket{le="81.92"} 4620
etcd_disk_backend_snapshot_duration_seconds_bucket{le="163.84"} 4620
etcd_disk_backend_snapshot_duration_seconds_bucket{le="327.68"} 4620
etcd_disk_backend_snapshot_duration_seconds_bucket{le="655.36"} 4620
etcd_disk_backend_snapshot_duration_seconds_bucket{le="+Inf"} 4620
etcd_disk_backend_snapshot_duration_seconds_sum 1397.2374600930025
etcd_disk_backend_snapshot_duration_seconds_count 4620
# HELP etcd_disk_wal_fsync_duration_seconds The latency distributions of fsync called by wal.
# TYPE etcd_disk_wal_fsync_duration_seconds histogram
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 4.659349e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 7.276276e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 8.589085e+06
If you want to configure a Prometheus to scrape etcd, you have to mount the certificates and create the job: The certificates are located in the master node in /etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.key and /etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.crt , just download the certificates and create the secrets on Kubernetes with the next command. Disclaimer: The etcd is the core of any Kubernetes cluster and if you don’t take caution with the certificates, you can expose the entire cluster and be potentially a target.
kubectl -n monitoring create secret generic etcd-ca --from-file=etcd-clients-ca.key --from-file etcd-clients-ca.crt
kubectl -n monitoring patch deployment prometheus-server -p '{"spec":{"template":{"spec":{"volumes":[{"name":"etcd-ca","secret":{"defaultMode":420,"secretName":"etcd-ca"}}]}}}}'
kubectl -n monitoring patch deployment prometheus-server -p '{"spec":{"template":{"spec":{"containers":[{"name":"prometheus-server","volumeMounts": [{"mountPath": "/opt/prometheus/secrets","name": "etcd-ca"}]}]}}}}'
    - job_name: etcd
      scheme: https
      - role: pod
      - action: keep
        - __meta_kubernetes_namespace
        - __meta_kubernetes_pod_name
        separator: '/'
        regex: 'kube-system/etcd-manager-main.+'
      - source_labels:
        - __address__
        action: replace
        target_label: __address__
        regex: (.+?)(\\:\\d)?
        replacement: $1:4001
        insecure_skip_verify: true
        cert_file: /opt/prometheus/secrets/etcd-clients-ca.crt
        key_file: /opt/prometheus/secrets/etcd-clients-ca.key
You can customize your own labels and relabeling configuration.

Monitoring etcd: What to look for?

Disclaimer: etcd metrics might differ between Kubernetes versions. Here, we used Kubernetes 1.15. You can check the metrics available for your version in the Kubernetes repo (link for the 1.15.3 version). etcd node availability: An obvious error scenario for any cluster is that you lose one of the nodes. The cluster will continue operating, but it’s probably a good idea to receive an alert, diagnose, and recover before you continue losing nodes and risk facing the next scenario, total service failure. The simplest way to check this is with a PromQL query:
This should give the number of nodes running, if some node is down you can see it and the worst case would be if the number is 0 then you will know there is a problem. etcd has a leader: One key metric is to know if all nodes have a leader. If one node does not have a leader, this node will be unavailable. And if all nodes have no leader, then the cluster will become totally unavailable. To check this, there is a metric that indicates whether a node has a leader.
# HELP etcd_server_has_leader Whether or not a leader exists. 1 is existence, 0 is not.
# TYPE etcd_server_has_leader gauge
etcd_server_has_leader 1
etcd leader changes: The leader can change over time, but too frequent changes can impact the performance of the etcd itself. This can also be a signal of the leader being unstable because of connectivity problems, or maybe etcd has too much load.
# HELP etcd_server_leader_changes_seen_total The number of leader changes seen.
# TYPE etcd_server_leader_changes_seen_total counter
etcd_server_leader_changes_seen_total 1
Consensus proposal: A proposal is a request (i.e., a write request, a configuration change request) that needs to go through raft protocol. The proposal metrics have four different types: committed, applied, pending, and failed. All four can give information about the problems the etcd can face, but the most important is the failed one. If there are proposals failed, it can be for two reasons; either the leader election is failing or there is a loss of the quorum. For example, if we wanted to set an alert to show that there were more than five consensus proposals failed over the course of a 15 minute period, we could use the following statement:
rate(etcd_server_proposals_failed_total{job=~"etcd"}[15m]) > 5
# HELP etcd_server_proposals_applied_total The total number of consensus proposals applied.
# TYPE etcd_server_proposals_applied_total gauge
etcd_server_proposals_applied_total 1.3605153e+07
# HELP etcd_server_proposals_committed_total The total number of consensus proposals committed.
# TYPE etcd_server_proposals_committed_total gauge
etcd_server_proposals_committed_total 1.3605153e+07
# HELP etcd_server_proposals_failed_total The total number of failed proposals seen.
# TYPE etcd_server_proposals_failed_total counter
etcd_server_proposals_failed_total 0
# HELP etcd_server_proposals_pending The current number of pending proposals to commit.
# TYPE etcd_server_proposals_pending gauge
etcd_server_proposals_pending 0
Disk sync duration: As etcd is storing all important things about Kubernetes, the speed of committing changes to disk and the health of your storage is a key indicator if etcd is working properly. If the disk sync has high latencies, then the disk may have issues or the cluster can become unavailable. The metrics that show this are wal_fsync_duration_seconds and backend_commit_duration_seconds.
# HELP etcd_disk_backend_commit_duration_seconds The latency distributions of commit called by backend.
# TYPE etcd_disk_backend_commit_duration_seconds histogram
etcd_disk_backend_commit_duration_seconds_bucket{le="0.001"} 0
etcd_disk_backend_commit_duration_seconds_bucket{le="0.002"} 5.402102e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="0.004"} 6.0471e+06
etcd_disk_backend_commit_duration_seconds_sum 11017.523900176226
etcd_disk_backend_commit_duration_seconds_count 6.157407e+06
# HELP etcd_disk_wal_fsync_duration_seconds The latency distributions of fsync called by wal.
# TYPE etcd_disk_wal_fsync_duration_seconds histogram
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 4.659349e+06
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 7.276276e+06
etcd_disk_wal_fsync_duration_seconds_sum 11580.35429902582
etcd_disk_wal_fsync_duration_seconds_count 8.786736e+06
To know if the duration of the backend commit is good enough, you can visualize if the duration of each commit is good enough in a histogram. With the next command, you can show the time latency in which 99% of requests are covered.
histogram_quantile(0.99, rate(etcd_disk_backend_commit_duration_seconds_bucket{job=~"etcd"}[5m]))

Monitoring etcd metrics in Sysdig Monitor

In order to track etcd in Sysdig Monitor, you have to add some sections to the agent YAML configuration file and use a Prometheus to gather the metrics and filter them. You can choose not to do so, but you will save a lot of debugging metrics that you don’t want. So, first of all, you should have a Prometheus up and running, but if you don’t then no worries. Deploying a new Prometheus is as simple as executing two commands; the first to create a namespace for the Prometheus, and the second is to deploy it with helm 3.
kubectl create ns monitoring
helm install -f values.yaml prometheus -n monitoring stable/prometheus
Being the values.yaml
   type: Recreate
 podAnnotations: "true" "9090"
Once Prometheus is up and running, you have to create the rules. These rules create new metrics with a custom tag which will filter the metrics collected by the Sysdig agent. You can find the rules on, and all of the steps you have to follow to have the monitoring etcd in Sysdig. As an example, this will be the agent configmap:
apiVersion: v1
kind: ConfigMap
  name: sysdig-agent
  namespace: sysdig-agent
  prometheus.yaml: |-
      scrape_interval: 15s
      evaluation_interval: 15s
    - job_name: 'prometheus' # config for federation
      honor_labels: true
      metrics_path: '/federate'
      - regex: 'kubernetes_pod_name'
        action: labeldrop
          - '{sysdig="true"}'
      - tags:
          namespace: monitoring
          deployment: prometheus-server
Monitoring etcd with a dashboard on Sysdig Monitor


etcd is a simple and robust service which is required to deploy a Kubernetes cluster. Even though the Raft distributed consensus algorithm is able to overcome most of the temporal network failures, node losses, cluster splits, etc., if you’re running Kubernetes in production, it is essential to monitor and set up alerts on relevant cluster events before it’s too late. If you dig into the list of etcd error codes, there are, of course, more advanced cases than the ones covered in this brief article: max numbers of peers in the cluster, anomaly detection between the Kubernetes and etcd nodes, Raft internal errors, and registry size, to name a few. We will leave those for a second part of this article in the future. Monitoring etcd with Sysdig Monitor is really easy. With just one tool you can monitor both etcd and Kubernetes. Sysdig Monitor agent will collect all the etcd metrics and you can quickly setup the most important etcd alerts. If you haven’t tried Sysdig Monitor yet, you are just one click away from our free Trial!