Getting started with PromQL

Jesus Ángel Samitier

Published:

March 11, 2021

Table of contents

This is the block containing the component that will be injected inside the Rich Text. You can hide this block if you want.

Getting started with PromQL can be challenging when you first arrive in the fascinating world of Prometheus, here’s a PromQL cheatsheet with interesting tips. Since Prometheus stores data in a time-series data model, queries in a Prometheus server are radically different from good old SQL.

Understanding how data is managed in Prometheus is key to learning how to write good, performant PromQL queries.

This article will introduce you to the PromQL basics and provide a cheat sheet you can download to dig deeper into Prometheus and PromQL.

In this article, you will learn:

How time-series databases work

Time series are streams of values associated with a timestamp.

Graphic showing three time series, each of them with different values at different timestamps

Every time series is identified by its metrics name and its labels, like:

mongodb_up{}

kube_node_labels{cluster="aws-01", label_kubernetes_io_role="master"}

In the above example, you can see the metric name (kube_node_labels) and the labels (cluster and label_kubernetes_io_role). Although normally this is how the metrics and labels are referenced, the name of the metric is actually a label too. The query above can also be written like this:

{__name__ = "kube_node_labels", cluster="aws-01", label_kubernetes_io_role="master"}

There are four types of metrics in Prometheus:

Gauges are arbitrary values that can go up and down. For example, mongodb_up tells us if the exporter has a connection to the MongoDB instance.
Counters represent totalizers from the beginning of the exporter and usually have the _total suffix. For example, http_requests_total.
Histogram samples observations, such as the request durations or response sizes, and counts them in configurable buckets.
Summary works as a histogram and also calculates configurable quantiles.

Gettings started with PromQL data selection

Selecting data in PromQL is as easy as specifying the metric you want to get the data from. In this example, we will use the metric http_requests_total.

Imagine that we want to know the number of requests for the /api path in the host 10.2.0.4. To do so, we will use the labels host and path from that metric.

We could run this PromQL query:

http_requests_total{host="10.2.0.4", path="/api"}

It would return the following data:

Every row in that table represents a series with the last available value. As http_requests_total contains the number of requests made since the last counter restart, we see 98 successful requests.

This is called an instant vector, the earliest value for every series at the moment specified by the query. As the samples are taken at random times, Prometheus has to make approximations to select the samples. If no time is specified, then it will return the last available value.

Graphic showing three time series and the exact time the query took place, returning an instant vector with the nearest values

Additionally, you can get an instant vector from another moment (i.e., from one day ago).

To do so, you only need to add an offset, like this:

http_requests_total{host="10.2.0.4", path="/api", status_code="200"} offset 1d

To obtain metric results within a timestamp range, you need to indicate it between brackets:

http_requests_total{host="10.2.0.4", path="/api"}[10m]

It would return something like this:

The query returns multiple values for each time series; that’s because we asked for data within a time range. Thus, every value is associated with a timestamp.

This is called a range vector: all the values for every series within a range of timestamps.

Graphic showing three time series and the time range the query took place, returning an range vector with the all the values inside the range

Getting started with PromQL aggregators and operators

As you can see, the PromQL selectors help you obtain metrics data. But what if you want to get more sophisticated results?

Imagine if we had the metric node_cpu_cores with a cluster label. We could, for example, sum the results, aggregating them by a particular label:

sum by (cluster) (node_cpu_cores)

This would return something like this:

With this simple query, we can see that there are 100 CPU cores for the cluster cluster_foo and 50 for the cluster_bar.

Furthermore, we can use arithmetic operators in our PromQL queries. For example, using the metric node_memory_MemFree_bytes that returns the amount of free memory in bytes, we could get that value in megabytes by using the div operator

node_memory_MemFree_bytes / (1024 * 1024)

We could also get the percentage of free memory available by comparing the previous metric with node_memory_MemTotal_bytes, which returns the total memory available in the node.

(node_memory_MemFree_bytes / node_memory_MemTotal_bytes) * 100

And using it for creating an alert in case there are nodes with less than 5% of free memory.

(node_memory_MemFree_bytes / node_memory_MemTotal_bytes) * 100 < 5

Getting started with PromQL functions

PromQL offers a vast collection of functions we can use to get even more sophisticated results. Continuing with the previous example, we could use the topk function to identify which two nodes have higher free memory percentages.

topk(2, (node_memory_MemFree_bytes / node_memory_MemTotal_bytes) * 100)

Prometheus not only gives us information from the past, but also the future. The predict_linear function predicts where the time series will be in the given amount of seconds. You may remember that we used this function to cook the perfect holiday ham.

Imagine that you want to know how much free disk space left will be available in the next 24 hours. You could apply the predict_linear function to last week’s results from node_filesystem_free_bytes metric, which returns the free disk space available. This lets you predict the free disk space, in gigabytes, in the next 24 hours.

predict_linear(node_filesystem_free_bytes[1w], 3600 * 24) / (1024 * 1024 * 1024) < 100

When working with Prometheus counters, the rate function is pretty convenient. It calculates a per-second increase of a counter, allowing for resets and extrapolating at edges to provide better results.

What if we need to create an alert when we haven’t received a request in the last 10 minutes. We couldn’t just use the http_requests_total metric because if the counter got reset during the timestamp range, the results wouldn’t be accurate.

http_requests_total[10m]

In the example above, as the counter got reset, there will be negative values from 300 to 50. Using just this metric wouldn’t be enough. Here is where the rate function comes to the rescue. As it considers the resets, the results are fixed as if they were like this:

rate(http_requests_total[10m])

Regardless of the resets, there were 0.83 requests per second as averaged in the last 10 minutes. Now we can configure the desired alert:

rate(http_requests_total[10m]) = 0

Next steps

In this article, we learned how Prometheus stores data and how to start selecting and aggregating data with PromQL examples.

You can download the PromQL Cheatsheet to learn more PromQL operators, aggregations, and functions, as well as examples. You can also try all the examples in our Prometheus playground.

Also, you can take a look at our Top 10 PromQL Examples for monitoring Kubernetes to get some inspiration.

You can also try the Sysdig Monitor Free 30-day Trial, since Sysdig Monitor is fully compatible with Prometheus. You’ll get started in just a few minutes.

More Prometheus query examples in our PromQL Library

We recently released our PromQL Library in Sysdig Monitor. In this library, you’ll find a curated list of Prometheus query examples so you don’t have to start googling or asking on Stackoverflow how to write that PromQL queries.

You can sign up for a free trial of Sysdig Monitor and try the new PromQL Library. Just find the PromQL query you need, click the Try me button, and voilà!

About the author

Monitoring

Open Source

featured resources

Test drive the right way to defend the cloud with a security expert

GET A DEMO