Prometheus metrics: instrumenting your app with custom metrics and autodiscovery on Docker containers

By on October 18, 2017
Prometheus and Sysdig

Prometheus metrics allows you to instrument your app with tagged custom metrics. We will talk about the format, types, benefits, how to instrument an application with Prometheus libraries for custom metrics and how Sysdig Monitor can autodiscover these metrics on Docker containers in Kubernetes.

What is Prometheus vs Prometheus metrics?

… And Prometheus stole the metrics from Mount Olympus and gave them to mankind.

Prometheus is an open-source monitoring solution, oriented towards highly dimensional data, decentralized and autonomous data sources with a powerful query language.

Prometheus metrics are increasingly becoming a defacto standard for custom metrics instrumentation.

Instrumenting your application allows you to measure things like:

  • How much time the customers spend on average on the product checkout page
  • How big is a certain data structure in memory
  • How many database requests are made by a certain code function

It certainly provides a lot of visibility to understand how things are working internally, but require instrumenting (modifying) your application code.

Instead of handling the metric calculation and exporting on your own, Prometheus project provides libraries for more than 10 programming languages. Don’t think these libraries are specific for monitoring only with Prometheus, actually they can export metrics to different time-series databases like InfluxDB, OpenTSDB or Graphite.

Sysdig Monitor is great at discovering and tagging custom metrics, and like we do with statsd or JMX, Sysdig will now automatically detect Prometheus metrics too!

Prometheus metrics format: dot-metrics vs tagged metrics

Let’s start with dot-notated metrics, for those of you familiar with the statsd metric format, this will be nothing new. In essence, everything you need to know about the metric is contained within the name of the metric. For example:

These metrics provide the detail and the hierarchy needed to effectively utilize your metrics. In order to make it fast and easy to use your metrics, this model of metrics exposition suggests that if you’d like a different aggregation, then you should calculate that metric up front and store it. So, with our example above, let’s suppose we were interested in request metrics across the entire service. We might flip our metrics to look something like this:

You could imagine many more combinations of metrics that might work for you. In fact, if you read a little about this on sites like StackOverflow, you’ll see that the exact setup that you’d end up with is somewhere between art and science.

The Prometheus Metric format takes a flat approach to naming metrics. Instead of a hierarchical, dot separated name, you have a name combined with a series of labels or tags:

<metric name>{<label name>=<label value>, ...}

A time series with the metric name http_requests_total and the labels service="service", server="pod50" and env=”production” could be written like this:

http_requests_total{service="service", server="pod50", env="production"}

Highly dimensional data basically means that you can associate any number of application specific labels to every metric you submit. These labels are the key-value pairs that will be used for grouping / graphing / segmentation / computation of composite views.

Imagine a typical metric like requests_per_second, every one of your web servers is emitting these metrics. Now you add the labels (or dimensions):

  • Web Server software (Nginx, Apache)
  • Environment (production, staging)
  • HTTP method (POST, GET)
  • Error code (yes, no)
  • HTTP response code (number)
  • Endpoint (/webapp1, /webapp2)
  • Datacenter zone (east, west)

And voila! You already have N-dimensional data, and can easily obtain the following example graphs:

  • Total number of requests per web server in production
  • Number of HTTP errors using Apache for webapp2 in staging
  • Number of POST requests processed per datacenter in a certain zone

This comes at the cost of heavier data post-processing of course, but modern monitoring systems are prepared for that.

When do you need to do code instrumentation?

Gaining insights of your applications sounds like a fantastic idea, and it is! But like all things in life, it has a few tradeoffs to consider:

  • Instrumenting with APM is simple code, but it is code nonetheless. This means that it needs to be done at development time, adds more complexity, and potentially brings software bugs. Legacy or external applications are typically very hard to instrument. Sometimes if developers haven’t instrumented the application, it’s just out of scope for operations people to do it.

  • Performance overhead, no monitoring is free. If you plan to monitor critical highly optimized loop code you may end up using more time emitting metrics than doing the actual work. More metrics in your monitoring backed will require more computing resources there.

Types of Prometheus metrics

The Prometheus client libraries offer four core metrics types:

  • Counter: A cumulative metric that represents a single numerical value that only ever goes up, you have a counter example in the code above.

  • Gauge: A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. For example number of active connections.

  • Histogram: A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values. Most commonplace metrics fall under this category.

  • Summary: Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window. A typical example is getting the 95th percentile of requests duration to understand the worst case responsiveness of our system.

A code example could be useful here. If you take a look at the requests_per_second N-dimensional metric we just mentioned, you can come to the conclusion that… well, Sysdig Monitor can see inside your containers and processes, decode most popular application protocols like HTTP and different databases like MySQL/PostgreSQL, Redis, Mongo, Cassandra, etc so you don’t really need Prometheus metrics instrumentation for monitoring requests, response time or errors ;-) (read more on must-have application key metrics on Brendan Gregg’s USE method).

But what if you need a custom metric or some other value that you can only obtain from the source code of your app? In these cases is where Prometheus metrics shine.

How to add custom metrics to your application

Let’s see a real small and quick example using Python.

First, if you haven’t done it already, install the python Prometheus library:

sudo pip install prometheus_client

And execute this trivial python script (but enough to see metric labels and histograms in action)

You can just display the raw Prometheus data accessing http://localhost:9100:
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="2",minor="7",patchlevel="13",version="2.7.13"} 1.0
# HELP function_exec_time Time spent processing a function
# TYPE function_exec_time histogram
function_exec_time_bucket{func_name="func1",le="0.005"} 0.0
function_exec_time_bucket{func_name="func1",le="0.01"} 0.0

Why does Sysdig love Prometheus metric support?

Prometheus libraries are open source and make pushing out metrics from any software platform and/or programming language super easy. As a result, a lot of the tools we love and use are already bundling it and you can enable application-specific metrics with little configuration.

Take LinkerD as an example, you can enable Prometheus metrics with just one configuration line.

Even if you already have a full-blown Prometheus-based monitoring on your Kubernetes cluster, you can integrate now Sysdig Monitor without friction. Sysdig will autodiscover the metrics you already expose and provide additional metadata on top, the same way we tag absolutely everything. All this without instrumenting again your code and systems nor cause any incompatibility or disturbance to your current setup.

Look Ma, no queries! Visualizing Prometheus metrics in Sysdig Monitor

We’ve applied all the goodness of the Sysdig Explore functionality (in addition to dashboards, alerting etc…) to your Prometheus metrics. So, in one screenshot:

prometheus metrics interface sysdig

The Explore interface lets you use all your labels to orient the way you display your metrics. That means you can take a physical (host, datacenter, etc) view, or with a click, change to have a logical view (Kubernetes deployments, pods, etc). Clicking on any row then scopes your metrics to that subsection of your infrastructure.

You can then browse or search your metrics, click on one of them, and then adjust aggregations and segment the metric by other, logical breakouts.

prometheus metrics list sysdig

And just like that, you’ve got all of your data at your fingertips with a few clicks. What’s more, you can easily enable your entire team to use this data, without trying to teach them a new DSL or (worse) being their query monkey.

prometheus metric graph sysdig

On top of that, you can use Prometheus metrics and dashboards to configure the related alerts and notifications:

Prometheus metrics alert

Willing to try this on your system already? This how to enable and configure Prometheus metrics on Sysdig agent.

Prometheus exporters and side-car containers

Much of the popular server software already bundles its own stats and metrics, think of the Apache status page, for instance. Exporters collect and adapt these metrics to be consumed by Prometheus. You can find exporters as a single process or as a side-car container to a Kubernetes pod.

prometheus exporter

In these cases, the Sysdig agent will automatically collect the exporter data for you although we recommend using Sysdig’s application-specific checks as will offer a better experience (more visibility and tagged metrics).


We have seen how to extend container and service visibility with Prometheus custom metrics for APM-style monitoring and gain visibility on what your code it's doing. Prometheus libraries are broadly used for custom metrics, no matter which Kubernetes monitoring system you use.

Sysdig Monitor will now unlock all the goodness from Prometheus with no disruption to your developers, and no work on maintaining or building a skill set around a complex new tool. Compatibility to co-exist or migrate, Prometheus metrics and Sysdig are now best friends.

Check out our code-instrumented version of the Docker’s demo example-voting-app and test it yourself with a 15 host, 300 container Sysdig Monitor free trial!

Eager to learn more? Join our webinar Container Troubleshooting with Sysdig

Btw, we are running a webinar discussing the challenges of troubleshooting issues and errors in Docker containers and Kubernetes, like pods in CrashLoopBackOff, join this session and learn:

  • How to gain visibility into Docker containers with Sysdig open source and Sysdig Inspect
  • Demo: troubleshoot a 502 Bad Gateway error on containerized app with HAproxy
  • Demo: troubleshoot a web application that mysteriously dies after some time
  • Demo: Nginx Kubernetes pod goes into CrashLoopBackOff, what's you can do? Will show you how to find the error without SSHin into production servers

Join Container Troubleshooting with Sysdig webinar

Stay up to date!

Get new articles from this blog (weekly)
Or container ecosystem updates (monthly)

Thanks so much for signing up!
Please check your inbox for a confirmation email.