Sysdig 2019 Container Usage Report: New Kubernetes and security insights

By Eric Carter - OCTOBER 29, 2019

SHARE:

usage report

We’re excited today to release the Sysdig 2019 Container Usage Report. Continued momentum for Kubernetes and greater adoption of cloud-native architectures are changing not just usage patterns, but processes and organizational structures as well. One of the surprising insights this year is the 2X increase in the number of containers that live for less than five minutes. As services grow more dynamic, cloud teams are recognizing the need to integrate security into their DevOps processes. For the first time we explore security and compliance details as a part of the 2019 usage report in addition to a range of details about how customers are using containers, Kubernetes, and more.

Sysdig’s unique vantage point

The Sysdig Secure DevOps Platform provides a real-world look into the infrastructure, applications, and containers.This includes companies around the world and across a broad range of industries. This year, we incorporate details from both SaaS and on-prem Sysdig users to provide a snapshot of enterprise usage across well over two million deployed containers.

Let’s dig in to the results.

What container platforms are being deployed?

In our 2018 report, we described how the Open Container Initiative (OCI) was helping to usher in alternate container runtimes. This has happened in a big way in 2019, with containerd grabbing a significant 18% share. To be fair, it’s important to note that containerd is used by Docker. The Docker engine previously implemented both high-level and low-level runtime features. These are now broken out into separate containerd and runc projects.

container runtimes

CRI-O makes its debut this year. One thing that surprised us is the small adoption rate to date. CRI-O, a lightweight runtime for Kubernetes, started at Red Hat in 2016 and was adopted into the CNCF® in 2019. We expect its use to climb as customers running Red Hat OpenShift migrate from v3 to v4, where CRI-O replaces the previously provided Docker engine.

Containers-per-host density increases 100%

Over the past year, the median number of containers per host doubled to 30, compared to 15 in 2018 and 10 in 2017.

We expected this number to increase based on several factors:

  • Growth in the number of applications being transitioned to cloud-native infrastructure
  • Inclusion of data from on-premises Sysdig customers who run larger, denser clusters
  • Increases in compute “horsepower,” enabling more containers to run on each node

For 2019, the maximum per-node density we saw was 250 containers – a 38% increase from 2018.

Container orchestration: Kubernetes dominates

It’s no surprise that as the de facto container orchestration tool, Kubernetes takes a whopping 77% share of orchestrators in-use. That number expands to 89% when you add in Red Hat OpenShift and Rancher – both built with Kubernetes. Here’s the current breakdown:

container orchestrators

Which platform do on-prem customers choose?

When we separate the data for companies who deploy the Sysdig platform on-premises, the orchestration picture changes significantly. Red Hat OpenShift Container Platform comes out on top with this segment. This is primarily because these organizations – typically larger and more risk-averse – want the advantages of Kubernetes, but prefer to do so with a commercially supported on-prem Platform-as-a-Service (PaaS) solution like OpenShift.

on-prem orchestration

We also explore the breakdown of public cloud use in the 2019 report. Download now to see the details.

Security and Compliance

“Shift security left” has become a buzz phrase that refers to building security into earlier stages of the development lifecycle. Organizations working with containers are in fact recognizing the need to integrate security and compliance into their DevOps workflows. To provide insights into the state of security and compliance with containers and Kubernetes, we analyzed data around vulnerability scanning, runtime security, and CIS compliance.

Vulnerability management

Customers scan images to identify, block, and resolve container vulnerabilities within CI/CD pipelines and container registries. In the full report, we look at registries in use, the percentage of images pulled from public vs. private repositories. We also sample the success/fail rate when scanning images for vulnerabilities. Here are a few things we learned.

Image pulls: public vs. private

How many containers are pulled from public vs. private repositories? We found that 40% of images come from public sources.

image pulls

The risk of using container images from public repositories is that few are validated or checked for security vulnerabilities. Using Docker Hub as an example, images with “Certified,” “Official,” and “Verified Publisher” are likely trustworthy. However, of the nearly 3 million images hosted, less than 1% carry these designations. To reduce the risk, cloud teams are creating policies to define which container registries are approved for use in their organizations.

Image scanning

Regardless of the source of the container images, performing image scanning to identify known vulnerabilities prior to deploying into production is a best practice that should not be skipped. To quantify the scope of the risk of vulnerabilities, we sampled pass and fail rates for images scanned over a five-day period. Over half of the images failed, meaning they were found to have known vulnerabilities with a severity of high or greater.

scanning results

Runtime security threats

Once known vulnerabilities have been addressed in development, cloud teams should set policies to detect anomalous behavior and trigger security alerts in production. Runtime security for Kubernetes is something organizations are just starting to address – but it’s happening quickly. In the last 12 months, there have been over 6.7 million Docker Hub pulls of Falco, the CNCF open-source runtime security project contributed by Sysdig. That’s an increase of 252% over the prior year.

We looked at policy violations as measured by the volume of alerts customers receive from Sysdig Secure, which automates runtime security with Falco policies. This indicates the types of runtime security risks that container users encounter most frequently. We found that the top container runtime security risks encountered include:

runtime security

In the full usage report, we detail the top 10 violations in order of frequency, along with a description of each to explain the possible threat.

Compliance

To reduce risk and meet compliance standards including PCI-DSS, HIPAA, and GDPR, organizations should regularly check hosts and containers against a set of best practices. Audits performed using built-in CIS benchmark for Docker checks in Sysdig Secure reveal room for improvement. For example, we found that on the median, container hosts have:

container compliance

The top 10 open-source solutions running in containers

Open source has changed the face of enterprise computing. It powers innovation across not just infrastructure, but especially application development. Sysdig auto-discovers the processes inside containers to get instant insight into the solutions that make up the cloud-native services that our customers run in production. Here are the top 10:

open source containers

What’s new this year is the arrival of Node.js and Go (aka golang), overtaking the use of Java. Java has long been one of the most prominent programming languages. DevOps and Cloud teams appear to favor newer options like Go, created by Google engineers, in part because of ease of use. For example, Node.js, a JavaScript runtime, simplifies writing code that runs equally well on servers as well as browsers. It’s also well suited for the new generation of databases like CouchDB and MongoDB, which support queries written in JavaScript.

Container lifespans

The measure of how long (or how short) containers, container images, and services live was one of the most popular data points from our 2018 report. It reflects just how dynamic modern applications are from both a development and a runtime perspective.

The short life of containers

Comparing container lifespans year over year, we found that the number of containers that are alive for 10 seconds or less has doubled to 22%. In fact, the number of containers that live for 5 minutes or less grew by 2X as well.

container lifespans

Many containers need to only live long enough to execute a function and then terminate when it’s complete. Seconds may seem short, but for some processes, it’s all that is required. We believe the increased use of Kubernetes Jobs that run finite tasks like batch jobs contributed to this growth. In fact, we expect short lifespans to increase, especially on serverless platforms that are well-suited to running short term tasks.

The ephemeral nature of containers is one of the technology’s unique advantages. Yet at the same time ephemeral containers can be a challenge in seeing security, health, and performance issues. Real-time monitoring, security, and compliance tools that provide real-time visibility in light of short-lived processes are key to successful operations.

Continuous development and image lifespans

Containers are a perfect companion to the agile movement. They help accelerate the development and release of code, often as containerized microservices. We found that over half of container images are replaced – aka churn – in a week or less. This reflects the reduction in the time between code releases. Further, it indicates that CI/CD pipelines are helping developer teams deliver software updates at a faster cadence than ever before.

image lifespans

Custom metrics

Custom metrics give developers and DevOps teams a way to instrument code to collect unique metrics. Of the three mainstay solutions, JMX, StatsD, and Prometheus, the past year saw Prometheus rise as the top solution in use. In fact, year-over-year, Prometheus metric use increased 130% across our customers who use custom metrics. JMX metrics (for Java apps) and StatsD are diminishing as the use of new programming frameworks that support Prometheus expand, down 45% and 17% respectively.

custom metrics

Check out the full report to see the top Prometheus metrics and exporters used by Sysdig customers.

Top alert conditions

The alerts set by Sysdig users showcase what cloud teams see as most disruptive to container operations. The most commonly used alert conditions have shifted in favor of Kubernetes infrastructure while continuing to focus on resource utilization and uptime. Of more than 800 unique alert conditions used across Sysdig customers, here are the top 3:

top alerts

In addition, alerts can be fine-tuned or “scoped” to specific tags or Kubernetes / cloud labels. For instance, using an example from the above alerts, you can specify cpu.used.percent alert for an individual namespace like “istio-system”, or for a specific Pod name like “envoy” inside that namespace. Check out the top alert scopes in the full report.

Kubernetes usage patterns

How many clusters are customers operating? How many Pods run per node? Does anyone use Kubernetes Jobs? The 2019 report answers these questions and more. Here’s a sample of what customers are deploying with Kubernetes.

Some customers maintain a few clusters – some small, some large – while others have a sizeable estate of many clusters of varying sizes. The following charts provide a distribution of cluster count and nodes per cluster for Sysdig platform users:

cluster stats

The large number of single clusters per customer, and relatively small number of nodes, is an indication that many enterprises are still early in their use of Kubernetes. We’ve also recognized that the use of managed Kubernetes services in public clouds impacts these data points. With services like Amazon Elastic Kubernetes Service (EKS), Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), and IBM Cloud Kubernetes Service (IKS) users can spin up and tear down clusters quickly as needed.

Pods per cluster

Pods are the smallest deployable object in Kubernetes. They contain one or more containers with shared storage and network, as well as a specification for how to run the containers. Here’s the breakdown across Sysdig platform users:

Pods per cluster

Note: This chart has been updated to correct an error in the original image. Big thanks to Chris Collins – aka @ChrisInDurham – for spotting the problem!

Pods per node

A Pod remains on a node until its process is complete, the Pod is deleted, the Pod is evicted from the node due to lack of resources, or the node fails. Here’s a snapshot of pods per node across Sysdig platform users:

pods per node

Insights into number of Kubernetes namespaces, deployments, StatefulSets, and Jobs are available in the full report.

Conclusion

With container density doubling since our last usage report, it’s evident that the rate of adoption is accelerating as usage matures. The key insights from our third annual report highlight the need for enterprises to take steps to prepare for the massive growth expected:

  • Organizations should invest in Kubernetes-native tools to simplify operating at scale.
  • Real-time visibility that provides detailed audit and forensics records for short-lived containers is critical to secure operations.
  • To keep ahead of runtime risks, cloud teams must act now to integrate security into DevOps.
  • As Prometheus extends its lead as the standard for cloud-native application metrics, users must learn how to leverage reliably and at scale.

Download the full Sysdig 2019 Container Usage Report for all the details now. Also, check out our recent cloud native and security reports.

Subscribe and get the latest updates