Our 2018 Docker Usage Report provides an inside look at shifting container trends as revealed by a point-in-time snapshot of real-world container usage as reported by the Sysdig Monitor and Sysdig Secure cloud service.
The quick summary: Organizations are getting more bang for their hardware buck by packing in 50% more containers per host, Docker still rules the roost but brand name container runtime environments are making inroads, and Kubernetes is still the king of container orchestration.
Our second annual Docker usage report show tremendous momentum across the Docker container ecosystem year-over-year. As more organizations transition to DevOps and microservices models, and advance expertise in the “modern stack”, we see more activity and more scale, but also an increased need for understanding all of the pieces that work well together.
Our study sample includes a broad cross-section of vertical industries and companies ranging in size from mid-market to large enterprises across North America, Latin America, EMEA, and Asia Pacific. Data from 90,000 containers – twice the sample size of last year – provides unique insight into use of containers in production.
What’s new with Docker in production? See real-world #container usage through #monitoring and #security metrics by @sysdigClick to tweet
What follows is a summary of the key findings. For full details, download your own copy of the report here.
The top 12 application components running in containers
Key Assessment: The old merges with the new
In addition, increased usage of database solutions like PostgreSQL and MongoDB running in containers signal that the move is on to stateful services in containers. The ephemeral nature of containers left many concerned about running services that collect valuable corporate data in containers. The concern appears to be easing as the data suggests customers are beginning to move to environments completely driven by containers.
Container density ratchets up
Key assessment: Median container density per host rises 50% year-over-year.
At the other end of the spectrum, in this survey we saw a customer running 154 containers on a single host! This is up from 95 that we observed last year. That’s a lot of containers.
What container runtimes are in use?
Key assessment: Docker still reigns, but we’re seeing what might be the first signs of cracks in the dam.
In fact they are. In the last year, customers have increased their use of other platforms. Rkt grew significantly to 12%, and Mesos containerizer to four percent. LXC also grew, although at a significantly lower rate. It appears from the data that customers have a greater comfort level with using “non-Docker” solutions in production.
Lifespan of containers and services
Key assessment: 95% of containers live less than a week
The following chart highlights the percentage of containers that appear and disappear over different intervals up to and beyond one week.
Eleven percent of containers stay alive for less than 10 seconds. The largest percentage – 27% – are containers that churn between five to 10 minutes.
Why do so many containers have such short lifespans? We know many customers have architected systems that scale as needed with demand and live only as long as they add value. Containers are created, do their work, and then go away. As an example, one customer spins up a container for each job they create in Jenkins. They test the change, then shut down the container. For them this takes place thousands of times a day.
We also looked at how long container images were in use. By looking at this data, we get an idea of how often customers are doing new deploys of updated containers as a part of their DevOps CI/CD process.
Orchestrators for Docker containers
Key assessment: First place goes to Kubernetes, followed by Kubernetes and then Kubernetes.
For example, Microsoft uses Kubernetes for its Azure Kubernetes Service (AKS), as does IBM with its Cloud Container Service and Cloud Private offering. Even Docker and Mesosphere have added support and functionality for Kubernetes. This means clear lines of demarcation no longer exist as they did in previous years. For instance, Mesosphere is able to deploy and manage “Kubernetes-as-a-service” in a DC/OS environment. Multiple Kubernetes clusters may be deployed under a single Mesosphere cluster.
For our report, we didn’t identify where customers use both orchestrators, but we plan to delve into that in the future.
Docker Swarm climbed into the second slot in this year’s study, surpassing Mesos-based tools. Given Docker has embraced Kubernetes, we didn’t expect this. Possible drivers include:
Swarm’s barrier to entry is incredibly low. While it may not have all the features of Kubernetes, as more people start with containers this may be the first stop in orchestration.
Docker Enterprise Edition, featuring the Docker Universal Control Plane (UCP) graphical user interface, simplifies many operational aspects of getting started with Swarm. Since Docker’s Kubernetes tie-in came late in 2017, any change in adoption in our customer base from Swarm to Kubernetes might be still forthcoming.
Figure 8. Kubernetes and Swarm grow orchestrator share, Mesos shrinks.
Cluster size influences orchestrator choice
Key assessment: Mesos owns the big cluster game.
While Mesos-based orchestration, including Mesos Marathon and Mesosphere DC/OS, dropped to third in this study, where Mesos is used, the median number of containers deployed is 50% higher than Kubernetes environments. This makes sense given Mesos tends to be targeted at large-scale container and cloud deployments. So while fewer in number, Mesos clusters are typically enterprise-scale.
Swarm clusters, conversely, were 30% smaller compared to Kubernetes.
Top flavors of Kubernetes
Key assessment: Here come the Kubernetes distributions.
Anecdotally, at Sysdig we see a greater percentage of our on-prem customers, who tend to be larger enterprises running Sysdig solutions in private data centers, adopt OpenShift in greater numbers than customers of our SaaS offering.
Rancher Labs emerged in 2015 with support for both Docker Swarm and Kubernetes. It wasn’t until 2017 that Rancher fully embraced Kubernetes as its orchestrator of choice.
Most popular alert conditions when using Docker containers
Key assessment: It’s all about performance and uptime
Responsiveness of app services is at the top of the list. Users want to know, “Is my app performing badly?” To see if a service is running well – or not – users look for the four “Golden Signals” – latency, traffic, errors, and saturation. Sysdig provides Golden Signal dashboards featuring these metrics to help answer the questions:
- How long does it take to service a request?
- How much demand is being placed on the system?
- How often do requests fail?
- How constrained are system resources?
With this knowledge, users have a good idea of whether the user experience is good or degraded.
Response time is the most widely used alert type configured, closely followed by uptime/downtime alerts. Sysdig allows thresholds so temporary blips, that are well-managed by a well-orchestrated environment, don’t result in false alarms.
Tried and true resource metrics – cpu, memory, and disk usage – are still widely used, with host-based alerts being the most frequently set. Users want to know if the server hosting Docker (physical, VM or cloud instance) is under strain or reaching capacity. The trigger for these alerts is most often set between 80-95% utilization.
On the rise, however, are container-focused resource alerts. The top used alerts come mainly in two flavors, 1) resource utilization, and 2) container count.
By default, containers have no resource limits. Given customers are increasingly alerting on container limits implies they are using Docker runtime configurations to control how much memory, CPU, or disk I/O containers can use and want to know when that goes out of scope and puts application performance at risk.
For container count, the concern is typically tied to the fact that users want at least X number of containers of a given type up and functioning to deliver the required service levels, especially in microservices deployments. For example, “I know my app works well when at least three NGINX containers are up. Anything less and I want to know.”
Orchestration-focused alerts are also increasingly popular. “Pod Restart Count” tops the list. In a pod, one or more containers are co-located and co-scheduled, typically as a part of a microservice. If a pod restarts too frequently, it indicates a problem that is likely to impact application performance.
Kubernetes administrators often use event-based alerts as well. This differs from metric-based alerts in that Sysdig looks for event messages generated in the environment such as a Kubernetes “CrashLoopBackoff” condition, where pods fail and restart repeatedly, or “Liveness probe failed,” which indicates whether a container is alive and running.
Http errors rounds out the list of top alert conditions. Http errors can indicate a problem with software or infrastructure that will ultimately impact performance.
Popular alert scopes
Key assessment: Users want to know – How’re my pods doing?
In the 2018 study the most common tags used to scope an alert are tied to Kubernetes. Scoping by pods is the leading choice followed closely by namespace. That’s not to say that physical hosts don’t matter. Our customers say they do. This is revealed by the fact that scoping by host names and tags is a solid third when it comes to choosing an alert scope.
Container specific scoping is also popular, evenly split across container name, container image, and container ID. Cloud provider tags in 2018 again rank high on the list, frequently targeting “name,” “environment,” “ID,” and “region” tags to scope by resource, dev/test and production, application, and location of cloud data center.
Custom metrics for application and infrastructure monitoring
Key assessment: There’s no one custom metrics format to rule them all
JMX metrics associated with Java applications were used by 55% of Sysdig SaaS users. This aligns with the fact that we see Java apps are very widely deployed. StatsD comes in at 29% and Prometheus is used by 20% of our SaaS users. With the popularity of Prometheus and its avid community support, we expect this number to grow over time.
Popular container registries
Key assessment: It’s a split decision. Registries are critical but there’s no clear leader
Summary: Momentum and maturity continue with the new stack
The data in this year’s report provides a point of visibility into the momentum behind the solutions that help customers as they build microservices-based services using containers and modern DevOps practices. New approaches are maturing and helping organizations develop applications more quickly to solve real business challenges and compete in the digital marketplace. We’ll be back next year to share what’s changed in the fast moving Docker space.
Would you like a downloadable PDF version of this report? Grab it here.
When operating a production platform we should prepare for failure and in addition to monitoring working metrics, we cannot forget about the most common failure points. From an agnostic monitoring solution perspective, and following a use-case driven approach, we will learn the most common failure points in a Kubernetes infrastructure and how to detect them (metrics, events, checks, etc).