Site Reliability Engineer (US – Remote)

US - Remote

Sysdig is the secure DevOps company, and we’re at the forefront of the container, Kubernetes, and cloud revolution. We are passionate, technical problem-solvers, continually innovating and delivering powerful solutions to confidently run cloud-native applications. Our consistent contributions to open source software projects reflect our commitment to the open cloud movement.

We value diversity and open dialog to spur ideas, working closely together to achieve goals. And we're a great place to work too — we were awarded the 2021 Bay Area Best Places to Work Award from San Francisco Business Times and the Silicon Valley Business Journal. We are looking for team members who share our commitment to customers and are willing to dig deeper, understand problems and deliver innovative solutions. Does this sound like the right place for you?

As a Site Reliability Engineer, you’ll be responsible for the availability, performance, and resilience of the Sysdig platform in our largest on-premise customer environments. You will collaborate with high-performing infrastructure and engineering teams both within Sysdig and customer organizations to help drive the scalability and stability of our platform.

What you will do:

  • Participate in a globally distributed team of Site Reliability Engineers, supporting multiple Sysdig applications across our most critical on-premises customers.
  • Produce best-practice recommendations for on-premises customers to improve customer experiences.
  • Implement disaster recovery and reliability improvement initiatives, including performance tuning and infrastructure optimization.
  • Maintain and support the production environments and communicate directly with customer stakeholders.
  • Participate in an on-call rotation

What you will bring with you:

  • Minimum of 5 years industry experience
    Prior experience in:
    • Deploying Kubernetes workloads in a production
    • Diagnosing and troubleshooting customer-facing
      production service outages
    • Writing applications or automation using Python/Golang
      or Bash
    • Using version control tools such as Git/Github
  • Working experience in managing one of the following database
    clusters. Managing includes installation, configuration,
    optimization, high availability improvement, failover,
    backup/restore, etc.
    • Cassandra, Elasticsearch, Kafka/Zookeeper, PostgreSQL

What we look for:

  • Strong sense of ownership
  • Strong desire to earn customer trust and obsess over customer
  • Proven ability to work across, collaborate with, and negotiate
    with diverse, distributed, or remote teams
  • Proven ability to work under pressure
  • Strong desire to coach or share information with others
  • Knowledge of Helm, Terraform, Prometheus, Grafana is preferred
  • Knowledge of Kubernetes Operators is a big plus

Key Technologies
Kubernetes, Golang, Python, Cassandra, Kafka, Elasticsearch,
PostgreSQL, Terraform, Helm

Why work at Sysdig?

  • We’re a well funded startup that already has a large enterprise customer base
  • We have a pragmatic, approachable culture, from the CEO down
  • We have an organizational focus on delivering value to customers
  • Our open source tools ( are widely used and loved by technologists & developers

When you join Sysdig, you can expect:

  • Competitive compensation package
  • Top-notch health insurance coverage

Additionally, we offer a variety of benefits and perks, such as:

  • 401k with company matching up to 3%
  • Flexible vacation policy

Are you ready to join us?

We're excited to receive your application.