Site Reliability Engineer in Blameless

Closed job - No longer accepting applications

Blameless is an end-to-end Site Reliability Engineering (SRE) platform that enables industry-leading reliability practices so engineering teams can deliver customer happiness with consistency and ease. Our platform helps engineering teams set and monitor SLOs, orchestrate incident response, identify contributing factors, and create a culture of learning and improvement across organizations. Blameless, based in San Mateo, is backed by Accel and Lightspeed Ventures.

Apply exclusively at getonbrd.com.

Job functions

We’re looking for an experienced Site Reliability Engineer to join our SRE team and contribute to a software platform that helps organizations and engineers streamline their reliability efforts. Our SREs play an integral role in architecting, building and iterating on our resilient, scalable systems.

You’ll join our passionate team and help us launch reliable features and services. Want to work with the latest and greatest tools and frameworks to support our growing backend and infrastructure needs? Are you self-motivated and comfortable working in a fast-paced environment? Then SRE at Blameless is the place for you!

In this role, you will:

Help manage our cloud infrastructure (AWS, GCP and Azure), scripting/coding (BASH, Python and Go), infrastructure-as-code (Terraform) and the CloudNative ecosystem (Kubernetes, Prometheus, Helm, etc.).
Develop infrastructure to deploy onto, ensuring scalability and resiliency
Improve processes to help optimize our ability to deploy quickly while maintaining high quality systems
Build tools and automation to fill the gaps in our current systems
Assist with incidents and support our Engineering team in dog fooding our Blameless incident orchestration
Conduct postmortems to ensure constant improvement

Requirements:

3+ years of experience as an Site Reliability Engineer
2+ years of experience with Kubernetes and managing cloud resources.
2+ years of managing cloud infrastructures (AWS, GCP, or Azure)
2+ years of experience working with BASH, Python, or GO
Experience working with Terraform or similar tools
Experience collecting and processing metrics from tools such as Prometheus/Datadog/NewRelic
A strong understanding of SLOs and SLIs
Experience building and working on deployment systems

The Impact of this Role:

Help us design and build our early-stage company
Accelerate our efforts towards product-market fitInfluence an open, productive and effective culture
Learn about operating a startup, fund-raising and scaling teams
Grow your career exponentially by joining a rocket ship

GETONBRD Job ID: 25763

Remote work policy

Locally remote only

Position is 100% remote, but candidates must reside in Chile, Argentina or United States.