Mezmo detects performance issues pre-impact 98% of the time

Mezmo Delivers Higher Uptime and Improved Customer Experience

< back to customer stories

View PDF

Mezmo Delivers Higher Uptime and Improved Customer Experience

Rapid insight to address security issues pre-production

Audit trails for efficient troubleshooting and compliance reporting

Fast ramp with dashboard and alert creation across all environments

“With Sysdig, we're able to resolve incidents faster. We’re able to get insights faster. We’re able to tell when there are performance problems faster. And so as a result, we're able to deliver a consistent and better customer experience than we otherwise would without Sysdig.”

Ryan Staatz

Systems Architect, Mezmo, Inc.

Company Overview

Mezmo, Inc. is a centralized log management solution that empowers DevOps teams with the tools that they need to develop and debug their applications with ease. It helps thousands of companies, from startups to large enterprises, take control of their data and gain valuable insights from their logs.

As Kubernetes has matured over the last few years, Mezmo found that a growing number of customers wanted a Kubernetes logging solution. To meet the market need and to take advantage of consistency, scalability, and repeatability, the company made the decision to transition its entire stack to run as microservices on Kubernetes.

According to Ryan Staatz, Systems Architect at Mezmo, “Microservices make things a lot more interesting. There are a lot of things running at the same time. One of the cloud environments we use is Amazon EKS and we really enjoy the fact that it is a managed service. We don’t have to focus on managing the Kubernetes masters and we can just do what we do best, which is running our application stack in Kubernetes.”

By building its environment on Kubernetes, Mezmo became first-hand Kubernetes experts for its customers, which includes understanding the tooling options available to them. The log management company runs more than two dozen Kubernetes clusters, containing over 1,000 workers and 21,000 pods, with a mixture of stateful and stateless applications.

Business Impact

Improved customer experience by reducing time to resolve performance issues
Improved efficiency for compliance and audit processes

Business Need

Deliver Kubernetes logging capabilities to customers
Transition the entire stack to run as microserves on Kubernetes

Mezmo

headquarters

Industry: Software Technology

Infrastructure: Amazon Web Services (AWS)

Orchestration: Amazon Elastic Kubernetes Service (EKS)

Solutions: Sysdig Secure, Sysdig Monitor

Company Overview

Business Impact

Improved customer experience by reducing time to resolve performance issues
Improved efficiency for compliance and audit processes

Business Need

Deliver Kubernetes logging capabilities to customers
Transition the entire stack to run as microserves on Kubernetes

Mezmo

headquarters

Industry: Software Technology

Infrastructure: Amazon Web Services (AWS)

Orchestration: Amazon Elastic Kubernetes Service (EKS)

Solutions: Sysdig Secure, Sysdig Monitor

Table of Contents

Text Link

This is the block containing the component that will be injected inside the Rich Text. You can hide this block if you want.

Challenges

Being in the data business, Mezmo understands the value of good data. It realized it needed better visibility and security for its AWS environment. Staatz explained, “If someone has access to a privileged container with the right settings, they can access and change the underlying node’s OS and aspects of it. Being able to monitor this is really important. Containers aren’t just free. They’re not just magically secured. And that’s just one reason you need monitoring.”

After initially building monitoring tooling alongside Prometheus, the Mezmo DevOps team realized the resource drain of scaling Prometheus. “We started trying to manage our own alerts and we found that it was somewhat unwieldy, at least at the time,” explained Staatz. “Having to consistently manage Prometheus over more than a dozen deployments was an operational burden. Using Sysdig to monitor Kubernetes metrics and security allows us to focus on logging.”

After evaluating solutions, Mezmo chose Sysdig. According to Staatz, “Sysdig is our go-to metrics provider for Kubernetes. There is no better choice than Sysdig to meet our monitoring and security needs. Sysdig reports on our entire AWS infrastructure, including application events.”

Mezmo also takes full advantage of the Sysdig alerting feature. “One of the really cool features of Sysdig is that it continuously scans container images and alerts when there is a problem,” said Staatz. “For example, if the developers are alerted to a vulnerability in our version of Lodash, they know that they need to check it out before that code goes to production. These proactive notifications really help our developers that are focused on making updates quickly. With Sysdig, we know if there’s an issue almost immediately after our CI or CD pipeline builds an image that is being shipped into the container registry. Every time we update a dependency, there’s obviously a risk of there being a vulnerability. So we use Sysdig all the time to find these vulnerabilities and alert us.”

“The detection and alerting has been super valuable,” continued Staatz. “And not just for us, but also for our end customers because we get those alerts quickly. It allows us to identify customer impacting events before the customers, easily 98% of the time. We pinpoint the cause quickly.

“For example, “ continued Staatz, “it could be a resource contention, maybe it’s a security concern, or perhaps it’s just somebody resolving incidents in some way. In general, at the end of the day, the measure of our business value is how effective we are at enabling a better experience for our customer. With Sysdig, we’re able to resolve incidents faster. We’re able to get insights faster. We’re able to tell when there are performance problems faster. And as a result, we’re able to deliver a consistent and better customer experience than we otherwise would without Sysdig.”

Solutions

“There’s no end to all the things you can do with Sysdig!”

Mezmo finds a lot of value in the dashboards. “The segmented line panels and the ability to scope variables make pivoting really helpful,” said Staatz. “One unexpected feature that I use heavily is the ability to choose whether to override the dashboard scope so context can be provided across multiple segments.”

Sysdig provides out-of-the-box dashboards to help organizations get started quickly and point novice users to what they should focus on. The platform also enables companies to build their own dashboards as their operations mature.

“We really appreciate the ability to set up dashboards that are custom-tailored to what we want to look at,” said Staatz. “I recently made a dashboard to track disk usage across mounts for a particular set of pods. Some graphs segment on Kubernetes cluster, others segment on mount point. The scope allows selecting for specific clusters, but you can easily compare the cluster-specific graphs with the segmented per cluster graphs for context if you need that.”

“I also like being able to see alerts overlaid onto these dashboards,” he continued. “This enables us to see if there’s a correlation between alerts and certain activity. This is something I use all the time with networking issues, CPU contention, memory, or even for something simple, like discovering there are too many pending tasks in ElasticSearch. It’s all in one place and super wonderful for us to use. There’s no end to all the things you can do with Sysdig.”

Fast Ramp and Minimal Effort To Maintain

Speaking on the setup and maintenance of Sysdig, Staatz said, “Setup is really simple. You put in some configurations. They run and deploy a daemonset, and you’re good to go. Just like that, all of these metrics are automatically shipped to Sysdig and you can see them in this wonderful UI. And it’s not just regular performance metrics or application metrics that you can see, but you can also observe insights around security as well. I was amazed by the amount of data collected by default by Sysdig. We don’t have to do much at all. It’s already there. We just have to make the dashboards that we want, and even then they have canned ones which is really cool.”

Staatz went on to explain, “The fact that Sysdig is immediately compatible with Kubernetes was a big draw for us. A lot of the security around Kubernetes is new and it’s kind of hard to grapple with at first. We don’t have to do a lot of managing the Sysdig stack, which ultimately makes our lives easier so we can focus on debugging our own stack.

‘Sysdig Captures’ Makes Compliance Audits Easier

Sysdig provides a feature called Sysdig Captures, which records audit trails in the event of anomalous behavior to help with post-event investigations and troubleshooting. Staatz explained, “A Sysdig Capture is a TCP dump-style of information saved, should there be a targeted area you want to look at. It includes information on the connections being made, things happening, and events.”

With the Sysdig single source of truth, Mezmo has access to a massive amount of granular data that can be cut and analyzed from any perspective. Sysdig Captures is unique to Sysdig and on top of the audit file it provides; the deep data can’t be matched. As Staatz said, “We get all sorts of information – at host, node, and pod level. Anything you’d ever want to know about the policy event can be logged – and our security team very much appreciates having this in place – especially around compliance and audits and things like that.”

Sysdig Helps Troubleshoot Tough Issues, Faster

Sysdig Captures also provides an audit trail that helps Mezmo when there is anomalous behavior but the container has been killed. “In addition to seeing and alerting on access, with Sysdig, you can also put custom policies in place that will do things,” said Staaz. “For example, we have policies that capture specific information if a certain thing happens so we have the file later to troubleshoot from.”

Speaking about troubleshooting, Staatz explained, “If you exec into a container shell and you start executing commands, Sysdig records that session for you. It’ll alert you so you can proactively know what’s happening immediately. Even when it’s something as innocuous as an SRE going in and doing something to help resolve an incident, you can at least know what’s happening because there is an audit of all of the different connections made to containers.”

The deep data Sysdig gets means there is more data to use when understanding an issue. “We used Sysdig to debug and track down the harder issues we’ve experienced,” said Staaz. “For example, we once had TCP errors between external load balancers and our pods that receive that traffic. Sysdig was instrumental in figuring out where that problem was. We’ve seen things like CPU contention where on the end-of-the-line node there was no high CPU but we saw performance issues for the pods running on it. When we looked closer, the load was sky high, for example. And so we were able to get those sorts of insights just by having Sysdig dashboards and pivoting on those pieces of information.”

For Mezmo, the insight gained from using a single security, compliance, and visibility platform tailored for Kuberenetes has made it easier for its security and DevOps teams to solve issues faster. The efficiencies gained from using Sysdig help Mezmo ensure their time and efforts are spent on executing strategic initiatives to better serve its customers and continue to grow its business.

To learn more about Mezmo, visit www.mezmo.com.

Mezmo Delivers Higher Uptime and Improved Customer Experience

Mezmo Delivers Higher Uptime and Improved Customer Experience

Company Overview

Business Impact

Business Need

Company Overview

Business Impact

Business Need

Challenges

Solutions

“There’s no end to all the things you can do with Sysdig!”

Fast Ramp and Minimal Effort To Maintain

‘Sysdig Captures’ Makes Compliance Audits Easier

Sysdig Helps Troubleshoot Tough Issues, Faster

Global infrastructure provider cuts SOC 2 audit work by 80%

Global technology leader gains unified visibility across 5 PB of data

Partior cuts alert noise 57% with Sysdig Sage™

Cryptotrading platform detects exposed credentials in real time

UIDAI protects 1.4B identities with real-time detection

Retail tech company achieves 3× remediation speed, 680% ROI

BigCommerce cuts noise 80% and boosts risk prioritization 20%

Healthcare IT provider cuts alerts by 99.8%, reduces vulnerability noise by 98%

Loglass strengthens cloud security with guidance from Sysdig Sage™

CoinDCX cuts misconfigurations 70% and speeds fixes 12×

JumpCloud slashes 80% of vulnerabilities and 99.8% of noise

Neo4j cuts false positives 75% and reduces vulnerabilities 80%

Zerobank Design Factory cuts alert fatigue, speeds security response

Syfe cuts compliance time by 75%, boosts CIS score 30 points

Automox cuts 80% of alerts and boosts triage speed 30%

RSI secures 100% of production environments in 6 weeks

Worldpay cuts operational overhead 50% and speeds PCI audits

Sprout Social detects threats 99% faster, cuts noise 98%

NTT DOCOMO reduces cloud costs while securing 80M users

Network boosts compliance 94% and cuts critical vulns 75%

Ben Visa Vale secures 800K cardholders, remediates 70% faster

Apree Health speeds remediation 80% and cuts audit prep time 50%

Data notebook company cuts malicious activity response time 99%

Bloomreach achieves 350% ROI and cuts observability costs 40%

Game development company saves millions with 75% lower costs

Goldman Sachs gains real-time visibility into millions of containers

Gini improves operations 25% and speeds developer workflows 20%

ICG Innovations consolidates five tools into one and cuts alerts 30%

BlaBlaCar keeps security lean while supporting 200 developers

Worldpay cuts operational overhead 50% and speeds PCI audits

SAP Concur supports 1,000+ daily merges with automated security

Mambu cuts false positives by 95%, eliminates recurring vulnerabilities

Enterprise financial institution secures 100K+ assets across 500+ accounts

Square Enix gains real-time runtime visibility and faster investigations

BitMEX halves triage time, investigates in 30 seconds

Immuta gains full visibility in 30 days, cuts false positives 85%

Like what you see?