Blog Icon

Blog Post

How to monitor Istio, the Kubernetes service mesh

In this article we are going to deploy and monitor Istio over a Kubernetes cluster. Istio is a service mesh platform that offers advanced routing, balancing, security and high availability features, plus Prometheus-style metrics for your services out of the box.

What is Istio?

Istio is a platform used to interconnect microservices. It provides advanced network features like load balancing, service-to-service authentication, monitoring, etc, without requiring any changes in service code.

In the Kubernetes context, Istio deploys an Envoy proxy as a sidecar container inside every pod that provides a service.

These proxies mediate every connection, and from that position they route the incoming / outgoing traffic and enforce the different security and network policies.

This dynamic group of proxies is managed by the Istio “control plane”, a separate set of pods that orchestrate the routing, Kubernetes security, live ruleset updates, etc.

Istio architecture overview


You have detailed descriptions of each subsystem component in the Istio project docs.

Service mesh explained: The rise of the “service mesh”

Containers are incredibly light and fast, it’s no surprise their density is roughly one order of magnitude greater than virtual machines. Classical monolithic component interconnection diagrams are rapidly turning into highly dynamic, fault tolerant, N-to-N communications with their own internal security rules, labeling-based routes, DNS and service directories, etc. The famous microservice mesh.

This means that while software autonomous units (containers) are becoming simpler and numerous, interconnection and troubleshooting distributed software behavior is actually getting harder.

And of course, we don’t want to burden containers with this complexity, we want them to stay thin and platform agnostic.

Kubernetes already offers a basic abstraction layer separating the service itself from the server pods. Several software projects are striving to tame this complexity, offering visibility, traceability and other advanced pod networking features, we already covered how to monitor Linkerd, let’s now talk about Istio.

Istio features overview

  • Intelligent routing and load balancing: Policies to map static service interfaces to different backend versions, allowing for A/B testing, canary deployments, gradual migration, etc. Istio also allows you to define routing rules based on HTTP-layer metadata like session tokens or user agent string.
  • Network resilience and health checks: timeouts, retry budgets, health checks and circuit breakers. Ensuring that unhealthy pods can be quickly weeded out of the service mesh
  • Policy Enforcement: Peer TLS authentication, pre-condition checking (whitelists and similar ACL), quota management to avoid service abuse and/or consumer starvation.
  • Telemetry, traceability and troubleshooting: telemetry is automatically injected in any service pod providing Prometheus-style network and L7 protocol metrics, Istio also dynamically traces the flow and chained connections of your microservices mesh.

How to deploy Istio in Kubernetes

Istio deployment overview

Istio developers have streamlined and simplified deploying the components in a new or existing Kubernetes cluster.

Just make sure that your Kubernetes version is 1.12 or newer. And that you don’t have any older version of Istio already installed on the system.

Following the installation instructions here. You can perform a manual installation using the YAML files provided in this repository, in this article we are going to use the Helm method, which is slightly faster and easier to customize.

Helm installation

You need to install the helm binary in your local host, which you can download here.

To do so, first, we are going to create a service account for Tiller:

$ kubectl apply -f install/kubernetes/helm/helm-service-account.yaml

Tiller needs to use this service account to install Istio in the cluster:

$ helm init --service-account tiller

Install the istio-init chart, this chart will bootstrap all the Istio’s CRDs:

$ helm install install/kubernetes/helm/istio-init --name istio-init --namespace istio-system

And finally, we use another chart to install Istio itself, there are different options available, depending on whether you want a minimal, full, demo apps installation, etc. In this case we want to install with demo apps included:

$ helm install install/kubernetes/helm/istio --name istio --namespace istio-system \
    --values install/kubernetes/helm/istio/values-istio-demo.yaml

Check that both charts are in DEPLOYED state:

$helm list
NAME      	REVISION	UPDATED                 	STATUS  	CHART           	APP VERSION	NAMESPACE   
istio     	1       	Thu Aug 15 10:40:50 2019	DEPLOYED	istio-1.2.4     	1.2.4      	istio-system
istio-init	1       	Thu Aug 15 10:27:32 2019	DEPLOYED	istio-init-1.2.4	1.2.4      	istio-system

Istio system services and pods will be ready in a few minutes:

$ kubectl get pods -n istio-system
NAME                                      READY   STATUS      RESTARTS   AGE
grafana-7869478fc5-x9cst                  1/1     Running     0          2m58s
istio-citadel-7bb58ffbbb-tg27w            1/1     Running     0          2m57s
istio-egressgateway-685b9654d6-f2bfx      1/1     Running     0          2m58s
istio-galley-5bf56745cf-qtgxl             1/1     Running     0          2m58s
istio-ingressgateway-856f5cb7f4-tvngk     1/1     Running     0          2m58s
istio-init-crd-10-2p2v4                   0/1     Completed   0          16m
istio-init-crd-11-s6gw8                   0/1     Completed   0          16m
istio-init-crd-12-8dh4h                   0/1     Completed   0          16m
istio-pilot-5569b96867-tlr9c              2/2     Running     0          2m57s
istio-policy-67d64695df-lrvxt             2/2     Running     2          2m57s
istio-sidecar-injector-6554655654-5xgjn   1/1     Running     0          2m57s
istio-telemetry-856d448bb-kkwd5           2/2     Running     2          2m57s
istio-tracing-79db5954f-5glt7             1/1     Running     0          2m56s
kiali-7b5b867f8-ktxs9                     1/1     Running     0          2m58s
prometheus-5b48f5d49-thgwc                1/1     Running     0          2m57s

Injecting the Istio Envoy proxy in your existing Kubernetes pods

As we mentioned in the architecture diagram, any service pod needs to be bundled with the Envoy container if you want to enable the Istio features for them. You can use the istioctl tool included in the repository to manually inject the Envoy configuration at deployment time.

$ istioctl kube-inject -f demo-red.yaml | kubectl apply -f -

But you will probably use a little more automation and simplicity. Istio already provides an automatic injection mechanism, that uses the Kubernetes admission controllers capability.

To enable automatic injection, you just need to label the target namespace (assuming admission controllers are enabled in your cluster, they should be by default):

$ kubectl label namespace default istio-injection=enabled
$ kubectl get namespace -L istio-injection
  NAME           STATUS    AGE       ISTIO-INJECTION
  default        Active    121m      enabled
  istio-system   Active    42m
  kube-public    Active    121m
  kube-system    Active    121m

If you deploy the example app:

$ kubectl apply -f samples/sleep/sleep.yaml

You should see this deployment has just one pod:

$ kubectl get deployment sleep
NAME    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
sleep   1         1         1            1           19s

But the number of containers in the pod is two, the second one is the Envoy proxy:

$ kubectl get pods
  NAME                     READY     STATUS    RESTARTS   AGE
  sleep-86cf99dfd6-fz2bj   2/2       Running   0          5m

OK, it looks like our pods and services have been correctly instrumented, but what happened here?

The MutatingWebhookConfiguration provided selects the pods in namespaces with label istio-injection=enabled, when the webhook is invoked by Kubernetes, and injects into them the sidecar container.

You can configure the injection policy and sidecar injection template modifying the istio-sidecar-injector ConfigMap in the istio-system namespace.

How to monitor Istio using Prometheus

One of the major infrastructure enhancements of tunneling your service traffic through the Istio Envoy proxies is that you automatically collect metrics that are fine-grained and provide high level application information (since they are reported for every service proxy).

These individuals metrics are then forwarded to the Mixer component (the pod is called telemetry), which aggregates and processes them for the entire mesh.

Istio provides multiple prometheus endpoints. Let’s start with the basics:

  • Port 42422 for the istio-telemetry service will provide the ingested and processed metrics computed by the mixer. This is the port that you need to scrape to monitor Istio applications.
  • Istio components (mixer, galley, policy, citadel) provide a self-monitoring port 15014 using Prometheus metrics, we will use those to evaluate the health of the control plane.
  • You can scrape the raw metrics directly from the Envoy proxies in the applications using the 15090 port.

You can take a look at the different endpoints and metrics provided by Istio using the internal Prometheus server that we deployed with the installation bundle:

$ kubectl port-forward service/prometheus -n istio-system 9090:9090

You can now access the Prometheus server UI opening http://localhost:9090/ in your web browser.

Istio Prometheus


There is also a Grafana deployment already preconfigured and ready to use:

$ kubectl port-forward service/grafana -n istio-system 3000:3000

You will be able to access this service at: http://localhost:3000/, if you want to look at your Istio applications performance, a good place to start is the workloads dashboard:

Grafana service

How to monitor Istio with Sysdig

Istio relies on the Prometheus metric format to provide telemetry, which is very convenient, because Sysdig will automatically detect and scrape Prometheus endpoints.

Let’s edit Sysdig agent configuration file (dragent.yaml). You just need to make sure that Prometheus scraping is enabled for your agents, look for this configuration snippet in the YAML and add it if it’s not already there:

prometheus:
  enabled: true
  histograms: true

Sysdig will scrape the pods that are annotated with Prometheus tags, you can also use these annotations to configure the port and path where the metrics are exposed.

Let’s annotate the Mixer deployment to collect service mesh metrics (port 42422):

$ kubectl -n istio-system patch deployment istio-telemetry -p '{"spec":{"template":{"metadata":{"annotations":{"prometheus.io/scrape": "true", "prometheus.io/port": "42422"}}}}}'

You will notice that the original telemetry pod has been terminated and a new one has been created, containing the Prometheus annotations:

$ kubectl describe pod istio-telemetry-5f769b4fb9-clqhl -n istio-system
...
Annotations:    prometheus.io/port: 42422
                prometheus.io/scrape: true
                sidecar.istio.io/inject: false
...

Every control plane component exposes Prometheus metrics using port 15014. If you don’t want to annotate them one by one, you can also modify the Sysdig agent configuration to automatically include any process exposing that port.

prometheus:
  enabled: true
  histograms: true
  process_filter:
    - include:
        port: 15014
        conf:
          port: 15014

Logging into the Sysdig Monitor web console, we check that the new metrics are indeed flowing to our cloud platform (metricCount.prometheus).

metric count prometheus


We are scraping Istio Prometheus metrics endpoints, time to monitor Istio!

Monitoring Istio: reference metrics and dashboards

Let’s start from the beginning, monitoring our services and application behaviour.

Segmenting by service and service version, these a few usual metrics that you want to monitor, coming from both the Istio Prometheus telemetry and Sysdig out-of-the-box metric collection:

  • Number of requests istio_request_total
  • Request duration istio_request_duration_seconds.avg, istio_request_duration_seconds.count
  • Request size istio_request_bytes.avg
  • Also, segmented percentiles for your HTTP connections, to delimit your worst case scenario:
    • net.http.request.time.p50
    • net.http.request.time.p90
    • net.http.request.time.p99
  • HTTP Error codes using the response_code tag
  • Bandwidth and disk IO consumption net.bytes.total, file.iops.total
  • Top accessed URLs, segmenting by net.http.url and net.http.method

Using the Sysdig dashboard wizard you can quickly assemble your custom service overview with the most important metrics

Monitoring Istio, services overview


Or just use our Istio default dashboards.

Istio System Overview

Istio dashboard overview


Istio Service

Istio Sysdig Dashboard Services


If you are familiar with Sysdig or have read other articles related to Kubernetes monitoring, you will soon realize than you already had similar HTTP / network metrics out of the box.

So, apart from the essentials, let’s highlight some of the additional features that Istio brings to the table in terms of monitoring.

HTTP request and response size (in bytes), not just as in network bandwidth but measuring each HTTP connection individually:

Istio Sysdig Bandwidth Service

You have specific metrics to monitor gRPC connections, segmented by method, code, service, type, etc. Istio is able to route HTTP/2 & gRPC through its proxies.

Istio Sysdig GRPC

Thanks to Istio connection traceability, you can also monitor the mentioned metrics (request count, duration, etc) not only from the destination but also from the source internal service (or version thereof):

Istio Sysdig Source Service

How to monitor Istio internals

Apart from monitoring the services, you can use Istio and Sysdig aggregated metrics to monitor Istio internal services health and performance.

Istio provides its own Ingress controller, this is a very relevant piece of our infrastructure to monitor. When your users are experiencing performance problems or errors, the edge router is one of the first points to check.

To assess the global health of your edge router connections you can display its connections table, global HTTP response codes, resource usage, number of request per service or URL, etc.

Connection Table

Connections Stats

Istio’s Mixer has several adapters where it forwards the telemetry information, you can use the mixer_runtime_dispatches_total metric segmented by adapter to visualize this information.

You can use the “runtime” metrics to monitor Istio’s Mixer communication with the adapters (Prometheus, Kubernetes)

Istio mixer adapters

  • mixer_runtime_dispatch_duration_seconds_bucket
  • mixer_runtime_dispatch_duration_seconds_count
  • mixer_runtime_dispatch_duration_seconds_sum

And the internal mixer gRPC metrics to monitor:

  • Incoming requests: grpc_io_server_completed_rpcs
  • Response duration: grpc_io_server_server_latency_bucket
  • Error rates: grpc_server_handled_total, ordering by the label grpc_code

Istio mixer grpc

Monitor Istio A/B deployments and canary deployments

One of Istio major features is the ability to establish intelligent routing based on service version.

The pods that provide the backend for a certain service will have different Kubernetes labels

Labels:         app=reviews
                pod-template-hash=3187719182
                version=v3

These different backends are transparent to the consumer (service or final user) but Istio can take advantage of this information to perform:

  • Content-based routing: For example if the user-agent is a mobile phone, you can change the specific service that formats the final HTML template
  • A/B deployments: two similar versions of the service that you want to compare in production
  • Canary deployment: Experimental service version that will only be triggered by certain conditions (like some specific test users)
  • Traffic Shifting: Progressive migration to the new service version maintaining the old version fully functional

Aggregating Istio and Sysdig metrics you can supervise these service migration will all the information you need to take further decisions.

For example we are comparing the alpha and beta service pods, they provide the same Kubernetes service, using Istio traffic shifting, we decide to split ingress traffic 50-50.

As you can see the number of requests and duration of requests (two top graphs) is extremely similar, so we can assume it’s a fair comparison in terms of load.

If you look at the two bottom graphs, it turns out that service alpha is suffering almost 3 times the number of HTTP errors and also its worst case response time (99 percentile down-right graph) is also significatively higher than service beta. Looks like our developers did a great job with the new version :).

Istio Sysdig A/B deployment

Conclusions

Istio solves the “mesh tangle” adding a transparent proxy as a sidecar to your service-provider pods. From this vantage position, it can collect fine-grained metrics and dynamically modify the routing flow without interfering with the pod software at all.

This strategy nicely complements Sysdig analogus non-intrusive, minimal-instrumentation approach to maintain your service pods simple and infrastructure agnostic (as they should be®).

Now you are collecting and organizing your service metrics into nice-looking dashboards. Do you know which metrics are really important to measure service quality and diagnose correct application behaviour? We recommend you to continue reading about the four golden signals of monitoring.

Share This

Stay up to date

Sign up to recieve our newest.

Related Posts

Sysdig announces expanded Prometheus support options

How to monitor Nginx on Kubernetes

Integrating Prometheus alerts and events with Sysdig Monitor