Blog Icon

Blog Post

How to monitor Istio, the Kubernetes service mesh

In this article we are going to deploy and monitor Istio over a Kubernetes cluster. Istio is a service mesh platform that offers advanced routing, balancing, security and high availability features, plus Prometheus-style metrics for your services out of the box.

What is Istio?

Istio is a platform used to interconnect microservices. It provides advanced network features like load balancing, service-to-service authentication, monitoring, etc, without requiring any changes in service code.

In the Kubernetes context, Istio deploys an Envoy proxy as a sidecar container inside every pod that provides a service.

These proxies mediate every connection, and from that position they route the incoming / outgoing traffic and enforce the different security and network policies.

This dynamic group of proxies is managed by the Istio “control plane”, a separate set of pods that orchestrate the routing, security, live ruleset updates, etc.

Istio archhitecture overview

You have detailed descriptions of each subsystem component in the Istio project docs.

Service mesh explained: The rise of the “service mesh”

Containers are incredibly light and fast, it’s no surprise their density is roughly one order of magnitude greater than virtual machines. Classical monolithic component interconnection diagrams are rapidly turning into highly dynamic, fault tolerant, N-to-N communications with their own internal security rules, labeling-based routes, DNS and service directories, etc. The famous microservice mesh.

This means that while software autonomous units (containers) are becoming simpler and numerous, interconnection and troubleshooting distributed software behavior is actually getting harder.

And of course, we don’t want to burden containers with this complexity, we want them to stay thin and platform agnostic.

Kubernetes already offers a basic abstraction layer separating the service itself from the server pods. Several software projects are striving to tame this complexity, offering visibility, traceability and other advanced pod networking features, we already covered how to monitor Linkerd, let’s now talk about Istio.

Istio features overview

  • Intelligent routing and load balancing: Policies to map static service interfaces to different backend versions, allowing for A/B testing, canary deployments, gradual migration, etc. Istio also allows you to define routing rules based on HTTP-layer metadata like session tokens or user agent string.
  • Network resilience and health checks: timeouts, retry budgets, health checks and circuit breakers. Ensuring that unhealthy pods can be quickly weeded out of the service mesh
  • Policy Enforcement: Peer TLS authentication, pre-condition checking (whitelists and similar ACL), quota management to avoid service abuse and/or consumer starvation.
  • Telemetry, traceability and troubleshooting: telemetry is automatically injected in any service pod providing Prometheus-style network and L7 protocol metrics, Istio also dynamically traces the flow and chained connections of your microservices mesh.

How to deploy Istio in Kubernetes

Istio deployment overview

Istio developers have made deploying the platform in a new or existing Kubernetes cluster simple enough.

Just make sure that your Kubernetes version is 1.7.3 or newer, with RBAC enabled. And that you don’t have any older version of Istio already installed on the system.

Following the installation instructions here:

$ curl -L | sh -
$ cd istio-<version>
$ export PATH=$PWD/bin:$PATH

Now you need to decide whether or not you want mutual TLS authentication between pods. If you choose to enable TLS your Istio services won’t be allowed to talk to non-Istio entities.

We will use the non-TLS version this time:

$ kubectl apply -f install/kubernetes/istio.yaml

Istio system services and pods will be ready in a few minutes:

$ kubectl get svc -n istio-system
NAME            CLUSTER-IP      EXTERNAL-IP      PORT(S)                                                            AGE
istio-ingress    some-external-ip   80:32633/TCP,443:31389/TCP                                         1d
istio-mixer    <none>           9091/TCP,15004/TCP,9093/TCP,9094/TCP,9102/TCP,9125/UDP,42422/TCP   1d
istio-pilot   <none>           15003/TCP,443/TCP                                                  1d

$ kubectl get pod -n istio-system
NAME                             READY     STATUS    RESTARTS   AGE
istio-ca-1363003450-vp7pt        1/1       Running   0          1d
istio-ingress-1005666339-w1gcs   1/1       Running   0          1d
istio-mixer-465004155-nncrd      3/3       Running   0          1d
istio-pilot-1861292947-zlt8w     2/2       Running   0          1d

But this process can be simplified further using Helm which is now the recommended install method for installing Istio to your production environment.

To do so, first, we are going to create a service account for Tiller:

$ kubectl apply -f install/kubernetes/helm/helm-service-account.yaml

Helm needs to use this service account, so we are going to install it with:

$ helm init --service-account tiller

Now Helm has full privileges to install Istio in our cluster:

$ helm install install/kubernetes/helm/istio --name istio --namespace istio-system

You can customize the installation disabling or enabling features, for example, the following command installs the minimal Istio setup:

$ helm install install/kubernetes/helm/istio --name istio-minimal --namespace istio-system 
      --set security.enabled=false 
      --set ingress.enabled=false 
      --set gateways.istio-ingressgateway.enabled=false 
      --set gateways.istio-egressgateway.enabled=false 
      --set galley.enabled=false 
      --set sidecarInjectorWebhook.enabled=false 
      --set mixer.enabled=false 
      --set prometheus.enabled=false 
      --set global.proxy.envoyStatsd.enabled=false 
      --set pilot.sidecar=false

Istio system services and pods will be ready in a few minutes:

$ kubectl get svc -n istio-system
NAME                     TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                                                                   AGE
istio-citadel            ClusterIP   <none>        8060/TCP,9093/TCP                                                                                                         24s
istio-egressgateway      ClusterIP   <none>        80/TCP,443/TCP                                                                                                            24s
istio-galley             ClusterIP   <none>        443/TCP,9093/TCP                                                                                                          24s
istio-ingressgateway     LoadBalancer   <pending>     80:31380/TCP,443:31390/TCP,31400:31400/TCP,15011:31610/TCP,8060:32575/TCP,853:32272/TCP,15030:31668/TCP,15031:30585/TCP   24s
istio-pilot              ClusterIP    <none>        15010/TCP,15011/TCP,8080/TCP,9093/TCP                                                                                     24s
istio-policy             ClusterIP     <none>        9091/TCP,15004/TCP,9093/TCP                                                                                               24s
istio-sidecar-injector   ClusterIP    <none>        443/TCP                                                                                                                   24s
istio-telemetry          ClusterIP    <none>        9091/TCP,15004/TCP,9093/TCP,42422/TCP                                                                                     24s
prometheus               ClusterIP   <none>        9090/TCP                                                                                                                  24s

$ kubectl get pod -n istio-system
NAME                                     READY     STATUS    RESTARTS   AGE
istio-citadel-5bbbc98c6d-r7fmm           1/1       Running   0          35s
istio-egressgateway-77dfd495df-4vxgq     1/1       Running   0          13s
istio-egressgateway-77dfd495df-qwkm6     1/1       Running   0          35s
istio-galley-744969c89-vm2r5             1/1       Running   0          35s
istio-ingressgateway-6bb7555c76-4vv2r    1/1       Running   0          35s
istio-pilot-6cbbb9bd95-vnkwr             2/2       Running   0          35s
istio-policy-755477988-5ztwk             2/2       Running   0          35s
istio-sidecar-injector-856b74c95-g4lb9   1/1       Running   0          35s
istio-telemetry-78f76f9d6-kvcfc          2/2       Running   0          35s
prometheus-65d6f6b6c-srmwq               1/1       Running   0          35s

Injecting the Istio Envoy proxy in your existing Kubernetes pods

As we mentioned in the architecture diagram, any service pod needs to be bundled with the Envoy container. Your Kubernetes cluster can be automatically instructed to do it if the alpha cluster features are enabled and you deploy the Istio-initializer:

$ kubectl apply -f install/kubernetes/istio-initializer.yaml

Your Kubernetes cluster can be automatically instructed to do it (if you didn’t deactivate it while installing it with Helm with --set sidecarInjectorWebhook.enabled=false), just need to enable the istio-injection label in a namespace:

$ kubectl label namespace default istio-injection=enabled
$ kubectl get namespace -L istio-injection
  default        Active    121m      enabled
  istio-system   Active    42m
  kube-public    Active    121m
  kube-system    Active    121m

If we deploy the example app:

$ kubectl apply -f samples/sleep/sleep.yaml

We should see the deployment with one container:

$ kubectl describe deployment sleep
  Name:                   sleep
  Namespace:              default
  CreationTimestamp:      Tue, 19 Mar 2019 11:46:54 +0100
  Labels:                 app=sleep
  Selector:               app=sleep
  Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
  StrategyType:           RollingUpdate
  MinReadySeconds:        0
  RollingUpdateStrategy:  1 max unavailable, 1 max surge
  Pod Template:
    Labels:  app=sleep
      Image:      pstauffer/curl
      Port:       <none>
      Host Port:  <none>
      Environment:  <none>
      Mounts:       <none>
    Volumes:        <none>
    Type           Status  Reason
    ----           ------  ------
    Available      True    MinimumReplicasAvailable
  OldReplicaSets:  <none>
  NewReplicaSet:   sleep-86cf99dfd6 (1/1 replicas created)
    Type    Reason             Age   From                   Message
    ----    ------             ----  ----                   -------
    Normal  ScalingReplicaSet  16m   deployment-controller  Scaled up replica set sleep-86cf99dfd6 to 1

But the number of containers in the pod is two, since one of them is the sidecar:

  $ kubectl get pods
  NAME                     READY     STATUS    RESTARTS   AGE
  sleep-86cf99dfd6-fz2bj   2/2       Running   0          5m

OK, it looks like our pods and services have been correctly instrumented, but what happened here?

The MutatingWebhookConfiguration provided selects the pods in namespaces with label istio-injection=enabled, when the webhook is invoked by Kubernetes, and injects into them the sidecar container.

You can configure the injection policy and sidecar injection template modifying the istio-sidecar-injector ConfigMap in the istio-system namespace.

Alternatively, instead of using the injection, you can rewrite your existing yaml definitions on the fly using istioctl (that’s probably only a good idea for learning purposes):

$ kubectl create -f <(istioctl kube-inject -f <your-app-spec>.yaml)

Let’s try with a simple single container deployment and service

$ kubectl apply -f <(istioctl kube-inject -f flask.yaml)
$ kubectl logs flask-1027288086-lr4fm
a container name must be specified for pod flask-1027288086-lr4fm, choose one of: [flask istio-proxy]

How to monitor Istio using Prometheus

One of the major infrastructure enhancements of tunneling your service traffic through the Istio Envoy proxies is that you automatically collect metrics that are fine-grained and provide high level application information (since they are reported for every service proxy).

These individuals metrics are then forwarded to the Mixer component, which aggregates them for the entire mesh.

Mixer provides three Prometheus endpoints:

  1. istio-mesh (istio-mixer.istio-system:42422): all Mixer-generated mesh metrics.
  2. mixer (istio-mixer.istio-system:9093): all Mixer-specific metrics. Used to monitor Mixer itself.
  3. envoy (istio-mixer.istio-system:9102): raw stats generated by Envoy (and translated from statsd to prometheus).

Istio project also provides examples and documentation on configuring a Prometheus server to scrape and analyze the most relevant metrics. Istio Helm chart can also deploy a Prometheus server automatically configure to scrape these metrics:

$ helm upgrade istio install/kubernetes/helm/istio --set prometheus.enabled=true

Wait until the pod is ready, and forward the Prometheus server port to your local machine

$ kubectl -n istio-system port-forward $(kubectl -n istio-system get pod -l app=prometheus -o jsonpath='{.items[0]}') 9090:9090 &

You can now access the Prometheus server UI opening http://localhost:9090/ in your web browser.

Sysdig monitoring Istio

There is also a Grafana deployment already preconfigured and ready to test at the Istio repository:

$ helm upgrade istio install/kubernetes/helm/istio --set grafana.enabled=true

Again, wait for the pod and service to be up and running and redirect the Grafana service port

$ kubectl -n istio-system port-forward $(kubectl -n istio-system get pod -l app=grafana -o jsonpath='{.items[0]}') 3000:3000 &

You can access the out of the box dashboard at http://localhost:3000/dashboard/db/istio-dashboard.

Grafana monitoring Istio

How to monitor Istio with Sysdig scraping Prometheus metrics

Istio core services using the Prometheus metric format is very convenient, because as you know, Sysdig will automatically detect and scrape Prometheus endpoints.

Let’s edit Sysdig agent configuration file (dragent.yaml) to configure which pods and ports should be scraped:

  enabled: true
   - include: true
             - include: 42422
             - include: 9102
             - include: 9093

Make sure that Prometheus is enabled and then, write an include filter. For this example, we use Kubernetes annotations, this way we can easily keep adding hosts without changing the agent configuration again.

Let’s annotate the Mixer pod (your specific serial number will vary):

$ kubectl -n istio-system patch deployment istio-policy -p '{"spec":{"template":{"metadata":{"annotations":{"": "true", "":"9093"}}}}}'
$ kubectl -n istio-system patch deployment istio-telemetry -p '{"spec":{"template":{"metadata":{"annotations":{"": "true"}}}}}'

Logging into the Sysdig Monitor web console, we check that the new metrics are indeed flowing to our cloud platform (metricCount.prometheus).

Istio Sysdig Scraping Prometheus

We are scraping Istio Prometheus metrics endpoints, time to monitor Istio!

Monitoring Istio: reference metrics and dashboards

Let’s start from the beginning, monitoring our services and application behaviour.

Segmenting by service and service version, a few usual metrics that you want to monitor (and create the associated Dashboards):

  • Number of requests istio_request_count
  • Request duration istio_request_duration.avg, istio_request_duration.count
  • Request size http_request_size_bytes.count
  • Also, 90-99 percentiles of these metric, to delimit your worst case scenario
    • http_request_size_bytes.90percentile
    • http_request_duration_microseconds.90percentile
    • http_request_size_bytes.99percentile
    • http_request_duration_microseconds.99percentile
  • HTTP Error codes response_code
  • Bandwidth and disk IO consumption in your serving pods
  • Top accessed URLs net.http.url, net.http.method

Using the Sysdig dashboard wizard you can quickly assemble your custom service overview with the most important metrics

Istio Sysdig Dashboard Example

Or just use our Istio default dashboards.

Istio System Overview

Istio Sysdig Dashboard Overview

Istio Service

Istio Sysdig Service Dashboard

If you are familiar with Sysdig or have read other articles related to Kubernetes and multiple HTTP services you will soon realize than you already had similar HTTP / network metrics out of the box.

So, apart from the essentials, let’s highlight some of the additional features that Istio brings to the table in terms of monitoring.

HTTP request and response size (in bytes), not just as in network bandwidth but measuring each HTTP connection individually:

Istio Sysdig Bandwidth Service

You have specific metrics to monitor gRPC connections, segmented by method, code, service, type, etc. Istio is able to route HTTP/2 & gRPC through its proxies.

Istio Sysdig GRPC

Thanks to Istio connection traceability, you can also monitor the mentioned metrics (request count, duration, etc) not only from the destination but also from the source internal service (or version thereof):

Istio Sysdig Source Service

Monitoring Istio internals

Apart from monitoring the services, you can use Istio and Sysdig aggregated metrics to monitor Istio internal services health and performance.

Istio provides its own Ingress controller, this is a very relevant piece of our infrastructure to monitor. When your users are experiencing performance problems or errors, the edge router is one of the first points to check.

To assess the global health of your edge router connections you can display its connections table, global HTTP response codes, resource usage, number of request per service or URL, etc.

Istio Sysdig Connection Table

Istio Sysdig Connections Stats

Istio’s Mixer has several adapters where it forward information you can use the mixer_adapter_dispatch_count metric segmented by adapter these connections

Istio Sysdig Mixer Info

Mixer will also be contacted by the services to retrieve authorization and preconditions info, you can monitor these connections (and the result code)

Istio Sysdig Mixer Info 2

You can use the “runtime” metrics to monitor Istio’s Mixer communication with the adapters (Prometheus, Kubernetes)

  • mixer_runtime_dispatch_count
  • mixer_runtime_dispatch_duration.avg
  • mixer_runtime_dispatch_duration.count

Monitoring Istio A/B deployments and canary deployments

One of Istio major features is the ability to establish intelligent routing based on service version.

The pods that provide the backend for a certain service will have different Kubernetes labels

Labels:         app=reviews

These different backends are transparent to the consumer (service or final user) but Istio can take advantage of this information to perform:

  • Content-based routing: For example if the user-agent is a mobile phone, you can change the specific service that formats the final HTML template
  • A/B deployments: two similar versions of the service that you want to compare in production
  • Canary deployment: Experimental service version that will only be triggered by certain conditions (like some specific test users)
  • Traffic Shifting: Progressive migration to the new service version maintaining the old version fully functional

Aggregating Istio and Sysdig metrics you can supervise these service migration will all the information you need to take further decisions.

For example we are comparing the alpha and beta service pods, they provide the same Kubernetes service, using Istio traffic shifting, we decide to split ingress traffic 50-50.

As you can see the number of requests and duration of requests (two top graphs) is extremely similar, so we can assume it’s a fair comparison in terms of load.

If you look at the two bottom graphs, it turns out that service alpha is suffering almost 3 times the number of HTTP errors and also its worst case response time (99 percentile down-right graph) is also significatively higher than service beta. Looks like our developers did a great job with the new version :).

Istio Sysdig AB Deployment


Istio solves the “mesh tangle” adding a transparent proxy as a sidecar to your service-provider pods. From this vantage position, it can collect fine-grained metrics and dynamically modify the routing flow without interfering with the pod software at all.

This strategy nicely complements Sysdig analogus non-intrusive, minimal-instrumentation approach to maintain your service pods simple and infrastructure agnostic (as they should be®).

The service mesh will be a hot topic for Kubernetes in 2018 and the jury is still out, we will keep an eye on the ecosystem to compare the family of service mesh infrastructures that are growing around the container stack.

Share This

Stay up to date

Sign up to recieve our newest.

Related Posts

Sysdig announces expanded Prometheus support options

How to monitor Nginx on Kubernetes

Integrating Prometheus alerts and events with Sysdig Monitor