Operations teams seem to have a love-hate relationship with alerting solutions. You love being notified the minute abnormal behavior or poor performance springs up inside your environment. But you hate being overwhelmed with too many alerts, too much noise. Even “accurate” alerts often that lack the context needed to quickly identify root cause. You’re left having to sort through a broad stack of opaque services and technologies to try to find the bottlenecks. And as we all know, the problem’s only getting worse: environments are getting more complex every day. Service oriented architectures, migration to cloud environments, and the introduction of containers into technology stacks means environments are being built on concepts that didn’t even exist a few short years ago. Over the past 12 months at Sysdig, we’ve talked to hundreds of IT organizations around the world about alerting. It is clear: current alerting solutions are not keeping up with the huge advances being made in operations engineering. Today it is my pleasure to announce the our new solution for alerting in the modern era: Intelligent Alerting.
Sysdig’s Intelligent Alerting
Sysdig’s intelligent alerting provides operations teams an easy and intuitive way to control how alerts are implemented. Our alerting engine is provided to you with no additional configuration or plugins required, simply install our agent on your host machines and we immediately provide visibility into all aspects of your infrastructure (servers, applications, network, containers, and databases) and allow you to alert on your infrastructure in a number of different ways:- Self-learning baselines continuously predict the normal performance of your system in order to automatically alert on anomalies.
- Group-based comparisons automatically monitor the behavior of clusters of machines and alert when any node deviates from the standard behavior of the group.
- Manual thresholding allows you to define your own fine-grained alerts with highly configurable conditions on all the different metrics automatically collected from across your application stack.
Fine-grained
Sysdig Monitor alerts offer an unprecedented range of specificity, from high level, dynamic system alerts to deep application monitoring. You can set up alerts on everything from certain tags we’ve imported from AWS, to certain container images that we’ve auto discovered inside your environment, to certain processes running on a node, to even certain network ports, HTTP status codes, or SQL queries. The list goes on and on.Natural workflow
Sysdig Monitor’s totally unique workflow allows you to automatically create an alert based on any view in the Sysdig UI. You’ve got to try this to believe it, but trust me: creating alerts for complex systems has never been this easy. Because Sysdig naturally understands the logical groupings of your infrastructure, you can easily set alerts that cover hundreds of machines or processes (even those that aren’t running yet!) to get both a macro and micro level view of system performance without needing to manage hundreds of separate alerts. All in just a few clicks. You can also easily tie alerting into your monitoring and troubleshooting process in new and intuitive ways. For example, if you discover some anomalous application behavior in a Sysdig Monitor chart, you can easily create an alert directly from that chart so the system notifies you once that behavior occurs again.Unified Context
Unlike most alerting systems, Sysdig Monitor doesn’t stop at the alert itself. We go one step further, offering you the full context you need to get to root cause. Sysdig stores the full state of your infrastructure with one-second granularity, so you can pause, rewind, and fast-forward through historical data to rapidly troubleshoot issues once an alert fires. To be clear, this isn’t limited to the scope of the alert – Sysdig stores the entire state of your infrastructure with high granularity so you can see single processes, requests, or network connections in the context of your overall system. No more silo’d toolsets to handle specific areas of your infrastructure: you can monitor, troubleshoot, and alert on all of these parts of your environment with one unified view of system performance.Use Cases
Let’s dive in and see how easy it is to set up powerful, intelligent alerting inside a distributed environment with Sysdig Cloud! Here are some example alerting scenarios I’ll walk you through:- Automatic host comparison
- Automatic baselines
- Container images
- Application errors
- Network connections
- Rapid troubleshooting