Backblaze is seeking a Sr. Site Reliability Engineer to ensure the stability, scalability, and reliability of our services and infrastructure. The role involves building automation, maintaining observability, and supporting incident response to keep customer-facing systems performing at their best.

Requirements

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
8+ years of progressive experience in site reliability, systems engineering, or operations
Extensive experience designing, scaling, and operating large-scale, production-grade distributed systems
Expert-level Linux systems administration and advanced troubleshooting skills
Lead security-minded operations, focusing on system-wide patching, hardening, and proactive vulnerability identification
Deep mastery of service reliability concepts, including advanced monitoring, complex alerting strategy, leading incident response, and in-depth root cause analysis
Advanced proficiency in at least one modern scripting/programming language (Python or Go strongly preferred)
Expert knowledge of incident response methodologies and operational best practices
Proven experience designing and operating container orchestration (Kubernetes, Docker) and microservices concepts required
Expert experience with Hashicorp products (Nomad, Vault, Terraform) in a production environment

Benefits

Healthcare for family, including dental and vision
Competitive compensation and 401K
RSU grants for full-time employees
ESPP program
Flexible vacation policy
Maternity & paternity leave
MacBook Pro to use for work, plus a generous stipend to personalize your workstation
Childcare bonus (human children only)
Fertility treatment and support
Learning & development program
Commuter benefits
Culture that supports a healthy work-life balance

Requirements

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)

8+ years of progressive experience in site reliability, systems engineering, or operations

Extensive experience designing, scaling, and operating large-scale, production-grade distributed systems

Expert-level Linux systems administration and advanced troubleshooting skills

Lead security-minded operations, focusing on system-wide patching, hardening, and proactive vulnerability identification

Deep mastery of service reliability concepts, including advanced monitoring, complex alerting strategy, leading incident response, and in-depth root cause analysis

Advanced proficiency in at least one modern scripting/programming language (Python or Go strongly preferred)

Expert knowledge of incident response methodologies and operational best practices

Proven experience designing and operating container orchestration (Kubernetes, Docker) and microservices concepts required

Expert experience with Hashicorp products (Nomad, Vault, Terraform) in a production environment

Benefits

Healthcare for family, including dental and vision

Competitive compensation and 401K

RSU grants for full-time employees

ESPP program

Flexible vacation policy

Maternity & paternity leave

MacBook Pro to use for work, plus a generous stipend to personalize your workstation

Childcare bonus (human children only)

Fertility treatment and support

Learning & development program

Commuter benefits

Culture that supports a healthy work-life balance

Sr. Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Sr. Site Reliability Engineer

Site Reliability Engineer II

Site Reliability Engineer I

Sr. Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Sr. Site Reliability Engineer

Site Reliability Engineer II

Site Reliability Engineer I

Job Details

About Backblaze