Sysdig Advisor: Making Kubernetes troubleshooting effortless

By Harry Perks - MAY 16, 2022
BACK TO blog

The cloud, Kubernetes, CI/CD, DevOps, GitOps… the last five years have seen a huge transformation in how organizations are architecting and shipping applications. It’s hard to keep up with the pace and learn all of this new tech!

Nearly 55% of respondents to Canonical’s 2021 Kubernetes and cloud native operations report highlighted how the lack of sufficient in-house skills and people power is the biggest challenge that Kubernetes brings to businesses. Let’s be clear – the shift to cloud native, when executed well, allows businesses to enjoy the fruits of their labor, but the human factor is often overlooked.

When thinking about operating Kubernetes in practice, the platform teams provide an environment for application developers to deploy applications. The lack of skills within these various teams manifests as SLAs being breached when things go wrong, and that costs money. Organizations often provide service credits / refunds for four or five nines not being met. And while automation helps, when there are problems it’s equally important to understand where and what to look for, as it is how to fix the issue.

Accelerate troubleshooting by up to 10x

We’re excited to announce Advisor, a new Kubernetes troubleshooting product in Sysdig Monitor, that accelerates troubleshooting by up to 10x. Advisor displays a prioritized list of issues and relevant troubleshooting data to surface the biggest problem areas and accelerate time to resolution.

Sysdig Monitor Advisor - Curated problem priorization. Highlights problems like CrashloopbackOff, Pending Pods, CPU Throttling, Node Pressure
Curated problem prioritization gives attention faster – allowing you to identify what’s on fire or what should need to be addressed soon.

Troubleshooting Kubernetes needs more than just metrics. For example, when debugging a CrashLoopBackoff, what’s the last state of a container? What are the events? What do the container logs say? When an issue is identified, Advisor gives you all the information you need to solve it, removing the dependence and context-switching of troubleshooting data such as logs, dashboards, and the command line or kubectl.

All of this information is actionable. The simple user interface surfaces all the important details in a single unified tool with a curated, actionable set of steps for remediation of Kubernetes breakages. No digging around knowledge content such as wikis, Stack Overflow, and blogs.

Sysdig Monitor Advisor - Identify and understand why a pod is in CrashLoopBackOff in 15 seconds
Identify and understand why a pod is in CrashLoopBackOff in 15 seconds

As soon as the agent is installed, Advisor will automatically identify problems by looking through thousands of different data points with zero configuration required.

Richest data for troubleshooting every type of problem

But of course things can go wrong for a multitude of different reasons. Advisor is a powerful troubleshooting tool for any kind of problem. You can browse your infrastructure, logically grouped by cluster, application, workload, and pod to understand what’s happening at a 10,000 foot view all the way down to deep network, file, and process metrics derived from syscalls for any pod in your environment. And it’s easy to see the right data; contextualize things with open alert incidents, container logs, object descriptions (eg. kubectl describe pod), a feed of events from Kubernetes and containers, and kube-state metrics.

Because Sysdig Advisor is doing the work for you, developers and other team members who don’t normally get access to kubectl can take advantage of all this information, too. No need to convince your security team to make an exception anymore when troubleshooting a Kubernetes application issue.

Examples of dashboard panels in Sysdig monitor. Golden Signals. Requests by Status Code.
Zero app instrumentation golden signals, network, file, and process telemetry

And for platform teams, Advisor helps you ensure your cluster is correctly sized. Be confident that you have enough capacity for new workloads, and existing workloads aren’t greedy with resources resulting in infrastructure waste (and money!).

Quickly monitor cluster capacity health, and identify resource status of workloads

Advisor is now available to all customers at no additional cost, and additional troubleshooting features will be added over the coming weeks. We’re always happy to hear how our products are helping you with operational excellence. Reach out to your Sysdig contact or chat with us in-app. Speedy troubleshooting!