Prometheus 2.35 – What’s new?

Prometheus 2.35 was released last month, focusing on a better integration with cloud providers. It also improved the service discovery, performance, and resources usage.

One key change was the migration to Go v1.18. It has brought some changes in the support for TLS 1.0, 1.1, and certificates signed with the SHA1 hash function.

Welcome to this first edition of What’s new in Prometheus. We love Prometheus, the de-facto open source standard monitoring tool!

In this article, we will analyze some new features, and the impact they might have on the Prometheus community. Here’s our editor’s pick:

#10501 Build with Go 1.18

Go 1.18 is now the default version for building Prometheus. This has two major implications in authentication.

TLS 1.0 and 1.1 disabled by default client-side. If you are using one of these, Prometheus now allows defining the minimum TLS version with the min_version config parameter (#10610).
Certificates signed with the SHA-1 hash function are rejected. This doesn’t apply to self-signed root certificates.

David Lorite – Integrations Engineer at Sysdig

#10516, #10476, #10365 Better integration with cloud providers

As in previous versions, Prometheus is adding better support to cloud providers. This release includes the libraries to authenticate with Google Cloud Platform (GCP) Kubernetes clusters (e.g. GKE) when configured in the Kubernetes service discovery.

The service discovery is more stable for Azure. Before this release, when any node wasn’t reachable, the service discovery for the whole cluster failed. Now, the service discovery continues with the rest of the available nodes. It also includes a metric (prometheus_sd_azure_failures_total) to register when nodes aren’t ready during service discovery. You can use this new metric to configure an alert to detect problems in Azure service discovery:

rate(prometheus_sd_azure_failures_total[5m]) > 0

Additionally, the Azure service discovery now includes the ResourceGroup filter. This is helpful to reduce the load of Prometheus service discovery, especially when running several AKS clusters, since ARM API requests are rate-limited in Azure.

Carlos Adiego – Integrations Engineer at Sysdig

#10080, #9570 Enhancements in Kubernetes Service Discovery

This version includes two exciting changes in the Kubernetes service discovery.

The addition of node metadata (name, address, labels annotations) in the targets selected through the pod role (#10080). This allows you to add filters in the relabeling allow list or deny list targets from specific nodes. Even more interesting, add the node labels to all metrics, which later allows you to make groupings by nodes or node labels (or annotations). There is a proposal to also analogously include namespace metadata. To add all the node labels to your metrics, you can use the action labelmap in the relabeling:
```
action: labelmap
regex: __meta_kubernetes_node_(.+)
replacement: 'kubernetes_node_$1'
```
EndpointSlice Kubernetes endpoint promoted to v1 from v1Beta1, as it is deprecated in Kubernetes 1.22.

David de Torres – Engineering manager at Sysdig

Do you want to know more about this? Check the What’s new in Kubernetes 1.24!

#10369 New stats for queries computational cost

Prometheus already had timing statistics for queries. However, as they are based on response time, the same query can have different execution times depending on the load of the system or other parallel queries. That was the motivation to include three new statistics in this version: totalQueryableSamples, totalQueryableSamplesPerStep, and peakSamples.

This is great news for engineers working on optimizing complex queries. Now, they can evaluate and compare the performance of the queries that they build.

Aleksandar Ponjavic – Engineering Manager at Sysdig

#10317, #10500 Optimization of TSDB at start

There are two enhancements in this version that involve the start of the TSDB.

The deletion of temporary files (#10317), something that could be skipped in some scenarios. This way, Prometheus ensures that it’s not using more disk space than needed and becomes a better citizen of the cluster.
A more efficient way to read the Write Ahead Log (WAL) at the start-up (#10500). This significantly reduces the time Prometheus needs to update the TSDB after a non-controlled stop.

Jesús Ángel Samitier – Integrations Engineer at Sysdig

#10498 Automatically limit CPU usage to the container CPU limit

Prometheus 2.35 brings a new feature to set the Go’s GOMAXPROCS environment variable in Go to the container CPU limits. This feature ensures that the process will never consume more than the limits. This feature is still experimental and can be enabled with --enable-feature=auto-gomaxprocs.

Carlos Arilla, Technical product manager at Sysdig

These were the new features chosen by our team, but there are more. You can find the full list of changes in the official release notes of Prometheus 2.35.