Top 10 PromQL examples for monitoring Kubernetes

In this article, you will find 10 practical Prometheus query examples for monitoring your Kubernetes cluster.

So you are just getting started with Prometheus, and are figuring out how to write PromQL queries. At Sysdig, we’ve got you covered! A while ago, we created a PromQL getting started guide. Now we’ll jump in skipping the theory, directly with some PromQL examples.

These Prometheus query examples are based on our own experience from helping hundreds of customers monitor their Kubernetes clusters every day.

Top 10 Prometheus query examples

Count of pods per cluster and namespace

Having a list of how many pods your namespaces have in your cluster can be useful for detecting an unusually high or low number of pods on your namespaces.

sum by (kube_namespace_name) (kube_pod_info)

Number of containers by cluster and namespace without CPU limits

Setting the right limits and requests in your cluster is essential in optimizing application and cluster performance. This query detects containers with no CPU limits.

count by (namespace)(sum by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))

Pod restarts by namespace

With this query, you’ll get all the pods that have been restarting. This is really important since a high pod restart rate usually means CrashLoopBackOff.

sum by (kube_namespace_name)(changes(kube_pod_status_ready{condition="true"}[5m]))

Pods not ready

This query lists all of the Pods with any kind of issue. This could be the first step for troubleshooting a situation.

sum by (kube_namespace_name) (kube_pod_status_ready{condition="false"})

CPU overcommit

CPU limits over the capacity of the cluster is a scenario you need to avoid. Otherwise, you’ll end up with CPU throttling issues. You can detect CPU overcommit with the following query.

sum(kube_pod_container_resource_limits{resource="cpu"}) - sum(kube_node_status_capacity_cpu_cores)

Memory overcommit

Memory limits over the capacity of the cluster could end up in PodEviction if a node is running out of memory. Be aware of this situation with this PromQL query.

sum(kube_pod_container_resource_limits{resource="memory"}) - sum(kube_node_status_capacity_memory_bytes)

Number of ready nodes per cluster

List the number of nodes available in each cluster.

sum(kube_node_status_condition{condition="Ready", status="true"}==1)

Nodes readiness flapping

Identify nodes flapping between the ready and not ready state.

sum(changes(kube_node_status_condition{status="true",condition="Ready"}[15m])) by (node) > 2

CPU idle by cluster

Computing capacity is one of the most delicate things to configure, and it’s one of the fundamental steps when performing Kubernetes capacity planning. With this query, you can detect how many CPU cores are underutilized.

sum((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[30m]) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="cpu"})) * -1 >0)

Memory idle by cluster

Save money detecting how much requested memory is underutilized in your cluster by using this query.

sum((container_memory_usage_bytes{container!="POD",container!=""} - on (namespace,pod,container) avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="memory"})) * -1 >0 ) / (1024*1024*1024)

Do you miss any queries? Tell us on Twitter, so we can keep this article up to date!

Want to dig deeper?

There are several resources available online to learn PromQL. You could download our PromQL Cheatsheet to learn how to write more complex PromQL queries.

Also, check out the great Awesome Prometheus alerts collection. With hundreds of Prometheus alert rules, you can inspect to learn more about PromQL and Prometheus.

More Prometheus query examples in our PromQL Library

We picked these Prometheus query examples from our PromQL Library in Sysdig Monitor. In this library, you’ll find a curated list of Prometheus query examples so you don’t have to start googling or asking on Stackoverflow how to write that PromQL queries.

Animation showing how PromQL Library works in Sysdig Monitor. Just find the query you need from the suggested Prometheus query examples and click on the Try Me button. It will take you to the PromQL Explorer, so you can run it.

You can sign up for a free trial of Sysdig Monitor and try the new PromQL Library. Just find the PromQL query you need, click the Try me button, and voilà!

Stay up to date

Sign up to receive our newest.

Related Posts

Getting started with PromQL – Includes Cheatsheet!

Kubernetes monitoring with Prometheus, the ultimate guide

Five Prometheus exporters best practices to increase your productivity