Sysdig

Bloomreach Achieves 350% ROI and Reduces Observability Costs by 40% with Sysdig

40%
Reduction in infrastructure monitoring costs
350%
ROI with an optimized operating environment that saves time, money, and manpower
2000+
Hours saved by increased
engineering productivity

Business Need

  • Reduce infrastructure monitoring costs
  • Reduce SRE workload and enable more proactive
    system management
  • Eliminate information overload and alert fatigue
  • Improve visibility into Kubernetes clusters
“Initially, each of our clusters had its own monitoring stack based on Grafana. The challenges began when we tried to replicate this setup at scale. Costs became a problem, and our SRE had to spend more and more time on upkeep.”
Matteo Giusto

Company Overview

Without visibility and observability, maintaining a reliable cloud platform is functionally impossible.

Bloomreach understands this firsthand. An industry leader in the personalization of the e-commerce experience, they aim to streamline the entire customer journey from top of funnel to post-purchase. Bloomreach’s customers can seamlessly manage omnichannel marketing, product discovery, and content creation, all while enabling deeper, more effective personalization.

Industry: Software Technology

Infrastructure: Amazon Web Services (AWS)

Orchestration: Amazon Elastic Kubernetes Service (EKS)

Solution: Sysdig Monitor, Sysdig Secure

Challenges

Growing Pains

Bloomreach’s Kubernetes environment is relatively complex, with both multi- and single-tenant clusters. Originally, each of these clusters had its own open source Prometheus-based monitoring stack. If you wanted to look at a cluster, you needed to point your browser to one of many Prometheus data sources.

While this would work well enough for a smaller company with minimal growth, Bloomreach was anything but. Identified in 2022 as one of the fastest-growing private companies in North America, Bloomreach quickly reached a tipping point with its existing platform application monitoring stack. The manual work and lack of a unified view quickly became untenable at scale.

“We started running into issues with correlating things on different clusters, and also reliability,” explained Matteo Giusto, Senior Engineering Manager at Bloomreach. “This made it difficult to stay on top of customer service-level agreements as we continued to grow.”

The monitoring stack was also incredibly time-consuming to manage. Each customer had its own cluster, which required the manual configuration and deployment of a monitoring dashboard. Bloomreach’s SRE team spent an inordinate amount of time simply keeping the lights on.

“Personally, running your own Prometheus cluster is cool from an engineering perspective, but when you have to do it at scale, it’s a full-time job,” Giusto said. “Keeping up with all of the maintenance, the changes, and the upgrades is really difficult. Then you have to scale up the team just to maintain the control plane – we didn’t really see the value of investing in that.”

Finally, there was the dual issue of alert fatigue and information overload. There were simply too many different metrics and dashboards to keep track of. The number of notifications was similarly overwhelming.

Because there was too much noise to effectively prioritize and act on notifications, multiple key performance indicators were impacted, including mean time to resolution. This ultimately drove Bloomreach to seek out a vendor to address its observability challenges.

"Having a monitoring and observability platform that’s flexible enough to ingest new metrics, create new dashboards, and evolve while still maintaining reliability is crucial. But that platform also can’t eat up all of the time for your infrastructure team. Sysdig understands that, and it shows."
Matteo Giusto

Solutions

A Scientific Approach to Selection

Having identified the need for a third-party observability solution, Bloomreach created a list of priority features around which it would focus its search:

  • A fixed pricing model
  • Prometheus-compatible cloud-native observability
  • Kubernetes monitoring with intuitive visualization
  • Streamlined, configurable alerts
  • Support for custom metrics
  • Unified visibility of all clusters across the environment for seamless availability and performance monitoring

Bloomreach performed a proof of concept with multiple vendors, and ultimately the company chose Sysdig Monitor. In part, this was because Sysdig Monitor is the first commercially available cloud monitoring platform fully compatible with Prometheus.

Predictable pricing, best-in-class customer support, and Sysdig’s status as a preferred independent software vendor partner of Amazon Web Services (AWS) were also major draws, as was Sysdig’s cost-containment capabilities.

An Empowering Partnership

Bloomreach was able to get Sysdig up and running quickly by automating it with TerraForm, which they leverage to automate and scale their deployments. In addition, they were able to quickly recreate existing dashboards by customizing Sysdig’s out-of-the-box dashboards.

 

“I think the most time-consuming step was identifying which of our own dashboards we needed to migrate to monitor through Sysdig,” Giusto said. “The deployment and configuration were a piece of cake.”

Sysdig’s support proved instrumental throughout this process as well.

“The collaboration that we had with Sysdig was really a game-changer, both before the sale and after the sale,” Giusto said. “Sysdig was actually interested in knowing what we were doing and adjusted its roadmap to our needs.”

Bringing Costs Under Control

With Sysdig Monitor, Bloomreach has gained better visibility into what metrics are being used, alongside substantial savings with the unit costs for custom metrics. As a result, Bloomreach has reduced the infrastructure monitoring costs for its Experience Manager product by over 40%. Unlike another department of the company that uses an alternative solution, the SRE team does not need to worry about any massive, unexpected bills.

“We’ve been able to reduce more than 40% of the cost that was spent on infrastructure monitoring just by running the Sysdig agent and pushing the metrics to Sysdig,” Giusto said. “Overall, I’d estimate our return on investment to be somewhere in the area of 350%.”

Bloomreach has also unshackled its SRE team from the bottleneck that was its old monitoring solution, leaving them free to focus on revenue-generating projects. In addition, they plan to leverage Cost Advisor within Sysdig Monitor to gain a deeper understanding of its cloud resources and potentially unlock even further savings.

“Cloud costs are still a blind spot,” Giusto said. “We spend a lot of time trying to track down costs across multiple teams and multi-tenant clusters. With Sysdig Cost Advisor, we will be able to better understand the utilization and cost of our Kubernetes environments. Having all of the information in one place will enable us to do showbacks and chargebacks, and will ultimately save us money.”

Scaling to Success

Through Sysdig Monitor, Bloomreach has improved both performance and availability across its solutions portfolio. They’ve positioned themselves for both current and future growth, while simultaneously improving their own customer experience.

And they’ve managed to achieve all of this without overwhelming their team with unnecessary data or false positives. Users only receive relevant alerts, and the company is currently exploring further opportunities to reduce workloads through automation.

“We’re always looking to improve our platforms and solutions, and observability plays a key role,” Giusto said. “With Sysdig, we’re able to both reduce complexity and react faster to customer requests. It puts us in a very good position.”

In addition, Bloomreach recently made the decision to expand its use cases with the addition of Sysdig Secure. With knowledge of what is in use at production using Sysdig runtime insights, Bloomreach is able to make better informed decisions across environments and the software life cycle. From prevention to defense, Sysdig Secure brings risk-based prioritization to streamline vulnerability management, permissions management, posture management, and threat detection and response.

“We needed a solution that gave us deep visibility into various deployments across our multi-cloud environment – AWS and Google Cloud,” Giusto said. “Consistency is key, given the diversity of our environments. That’s why we chose Sysdig; its runtime insights provide the necessary context to prioritize actual risks.”

“Sysdig automatically correlates running processes with vulnerabilities, offering a technical setup that aligns with our operational model,” he said. “This allows the security operations center team to easily manage the system while providing ultimate oversight to the security team.”

To learn more about Bloomreach, visit bloomreach.com.

"The first thing that comes to my mind is that nobody on the team has to question if our monitoring is working. There’s no maintenance required, and we know that we have a reliable solution. If someone is on call in the middle of the night and gets notified, they can rely on the Sysdig dashboard to see what’s going on in all of our clusters."
Matteo Giusto

About Sysdig

In the cloud, every second counts. Attacks move at warp speed, and security teams must protect the business without slowing it down. Sysdig stops cloud attacks in real time, instantly detecting changes in risk with runtime insights and open source Falco. We correlate signals across cloud workloads, identities, and services to uncover hidden attack paths and prioritize real risk. From prevention to defense, Sysdig helps enterprises focus on what matters: innovation.

Sysdig. Secure Every Second.

Take the Next Step!

See how you can secure every second in the cloud.