Site Reliability Engineer


Sysdig is driving the standard for securing the cloud and containers. We created Falco, the open standard for cloud-native threat detection, and consistently contribute to open source software projects.  We are passionate, technical problem-solvers, continually innovating and delivering powerful solutions to secure the cloud from source to run.

We value diversity and open dialog to spur ideas, working closely together to achieve goals. We’re an international company that understands how to cultivate a strong culture across a remote team. And we're a great place to work too — we've been named a Bay Area Best Place to Work by the San Francisco Business Times and the Silicon Valley Business Journal for three years now! We were recognized by Deloitte as one of the 500 fastest growing organizations in 2020 and 2021. We are looking for team members who have a passion for container and cloud security and are willing to dig deeper to help our customers. Does this sound like the right place for you?

Your Opportunity

As Site Reliability Engineer on our Infrastructure team, you will contribute to improve Sysdig provisioning, monitoring, and cloud platform management. You have an aptitude for analytical and creative problem solving and you are very excited to use the power of automation to manage the stability, availability, and scale of our Infrastructure.

Your Responsibilities:

You will join a highly skilled and globally distributed team of SREs, and you can expect to:

  • Build solutions to enhance the observability, availability, performance, and resilience of the Sysdig SaaS and On-Premise products
  • Implement reliability improvement initiatives, including performance tuning and infrastructure optimization
  • Maintain and support the production environments and communicate directly with customer stakeholders
  • Participate in an on-call rotation with other SREs

Your Background

  • Experience managing Kubernetes clusters in a production environment
  • Solid understanding of Linux systems and networking
  • Proficiency with infrastructure as code/configuration management tools. We love Terraform, but you may have experience with Ansible, Chef, Puppet or SaltStack
  • Familiarity with monitoring tools such as Sysdig, Prometheus, Nagios, Icinga, Zabbix
  • Experience managing multi-tenant solutions with Cassandra, Elasticsearch, Kafka or Redis
  • Proficiency with SQL relational databases, preferably PostgreSQL and MySQL
  • Command of a scripting language such as python or bash
  • Knowledge of CI/CD concepts; hands-on experienced is a strong plus
  • Experience supporting a customer-facing product hosted in a public or private cloud ecosystem
  • Experience diagnosing and troubleshooting complex problems in high-throughput web applications and network services
  • Strong sense of ownership and a focus on customer delight 

Key Technologies

Kubernetes, Docker, Python, Cassandra, Kafka, Elasticsearch, Redis, Terraform, PostgreSQL, AWS

Why work at Sysdig?

  • We’re a well-funded startup that already has a large enterprise customer base
  • We have a pragmatic, transparent culture, from the CEO down
  • We have an organizational focus on delivering value to customers
  • Our open source tools ( are widely used and loved by technologists & developers

When you join Sysdig, you can expect:

  • Competitive compensation including equity opportunities
  • Flexible hours and additional recharge days
  • Mental wellbeing support through Modern Health for you and your family
  • Monthly wellness reimbursement
  • Career growth

Some of our Hiring Managers are globally distributed, an English version of your Cv will be highly appreciated!



Are you ready to join us?

We're excited to receive your application.