Site Reliability Engineer in Blameless

Closed job - No longer receiving applicants

Blameless is an end-to-end Site Reliability Engineering (SRE) platform that enables industry-leading reliability practices so engineering teams can deliver customer happiness with consistency and ease. Our platform helps engineering teams set and monitor SLOs, orchestrate incident response, identify contributing factors, and create a culture of learning and improvement across organizations. Blameless, based in San Mateo, is backed by Accel and Lightspeed Ventures.

Job functions

We’re looking for an experienced Site Reliability Engineer to join our SRE team and contribute to a software platform that helps organizations and engineers streamline their reliability efforts. Our SREs play an integral role in architecting, building and iterating on our resilient, scalable systems.

You’ll join our passionate team and help us launch reliable features and services. Want to work with the latest and greatest tools and frameworks to support our growing backend and infrastructure needs? Are you self-motivated and comfortable working in a fast-paced environment? Then SRE at Blameless is the place for you!

In this role, you will:

  • Help manage our cloud infrastructure (AWS, GCP and Azure), scripting/coding (BASH, Python and Go), infrastructure-as-code (Terraform) and the CloudNative ecosystem (Kubernetes, Prometheus, Helm, etc.).
  • Develop infrastructure to deploy onto, ensuring scalability and resiliency
  • Improve processes to help optimize our ability to deploy quickly while maintaining high quality systems
  • Build tools and automation to fill the gaps in our current systems
  • Assist with incidents and support our Engineering team in dog fooding our Blameless incident orchestration
  • Conduct postmortems to ensure constant improvement

Requirements:

  • 3+ years of experience as an Site Reliability Engineer
  • 2+ years of experience with Kubernetes and managing cloud resources.
  • 2+ years of managing cloud infrastructures (AWS, GCP, or Azure)
  • 2+ years of experience working with BASH, Python, or GO
  • Experience working with Terraform or similar tools
  • Experience collecting and processing metrics from tools such as Prometheus/Datadog/NewRelic
  • A strong understanding of SLOs and SLIs
  • Experience building and working on deployment systems

The Impact of this Role:

  • Help us design and build our early-stage company
  • Accelerate our efforts towards product-market fitInfluence an open, productive and effective culture
  • Learn about operating a startup, fund-raising and scaling teams
  • Grow your career exponentially by joining a rocket ship

Remote work policy

Locally remote only

Position is 100% remote, but candidates must reside in Chile, Argentina or United States.

Life's too short for bad jobs.
Sign up for free and find jobs that are truly your match.