Pager fatigue is real. And there’s nothing worse than getting woken up in the middle of the night for a useless alert. Sure, memory usage may spike, but you might only want to be jolted awake if database transactions per second drops below a certain threshold AND average request latency increases to an unacceptable amount. We’ve added multi-condition alerts to Sysdig Cloud to keep users sleeping peacefully through the night.
Setting Up a Multi-Condition Alert
Creating a multi-condition alert is simple and very similar to creating a general alert. To being click on the alert icon in any panel, view, dashboard, or navigate to the alerts tab itself. Once on the alerts menu switch to advanced mode and click on help to get examples and a full list of all metrics.
Advanced AlertingFor advanced alerts we built a powerful alerting syntax that has all the aggregate function and relational operators you’d expect. Each condition in an alert has five parts:
- Group aggregation (optional) tell Sysdig Cloud how to aggregate individual data points across a group of nodes. For more details, see the support page on this topic.
- Time aggregation – tell Sysdig Cloud how to aggregate individual data points across a stretch of time
groupAggregation(timeAggregation(metric.name)) operator valueHere are some examples of advanced alerts:
timeAvg(container.count) != 10In this example we know for our test environment we should have exactly 10 containers running at any point in time, so we set the alert to fire if the average container count does not equal 10 during the timespan. Timespan is defined within the UI as the last field within step 2.
min(min(cpu.used.percent)) <= 30 OR max(max(cpu.used.percent)) >= 60This example shows an alert that functions across a group of nodes. The alert triggers if minimum CPU utilization for any node in the group should not be below 30% or the maximum CPU for any individual node exceeds 60%.
timeAvg(cpu.used.percent) > 50 AND (timeAvg(mysql.net.connections) > 20In this alert we wanted to capture the health of our mysql database across multiple conditions. So if CPU usage is rising and the subsequent network connections are going up or if memory usage is unacceptable we want to get an alert.
OR timeAvg(memory.used.percent) > 75)
Like all Sysdig Cloud alerts, a Sysdig Capture can be initiated when the alert fires. Enabling users with a path to deep system call level details about everything happening on each host when the alert fired.
Multi-condition alerts are just another way Sysdig Cloud makes your life easier. Applying multiple conditions to an alert allows you to fine-tune alert specificity, and then also allows you to get more detailed in playbooks for alert remediation.
Btw, we are running a webinar discussing the challenges of troubleshooting issues and errors in Docker containers and Kubernetes, like pods in CrashLoopBackOff, join this session and learn:
- How to gain visibility into Docker containers with Sysdig open source and Sysdig Inspect.
- Demo: troubleshoot a 502 Bad Gateway error on containerized app with HAproxy.
- Demo: troubleshoot a web application that mysteriously dies after some time.
- Demo: Nginx Kubernetes pod goes into CrashLoopBackOff, what's you can do? Will show you how to find the error without SSHin into production servers.
Start Your Free Trial Today