Backblaze is seeking a Sr. Site Reliability Engineer to ensure the stability, scalability, and reliability of our services and infrastructure. The role involves building automation, maintaining observability, and supporting incident response to keep customer-facing systems performing at their best.
Requirements
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
- 8+ years of progressive experience in site reliability, systems engineering, or operations
- Extensive experience designing, scaling, and operating large-scale, production-grade distributed systems
- Expert-level Linux systems administration and advanced troubleshooting skills
- Lead security-minded operations, focusing on system-wide patching, hardening, and proactive vulnerability identification
- Deep mastery of service reliability concepts, including advanced monitoring, complex alerting strategy, leading incident response, and in-depth root cause analysis
- Advanced proficiency in at least one modern scripting/programming language (Python or Go strongly preferred)
- Expert knowledge of incident response methodologies and operational best practices
- Proven experience designing and operating container orchestration (Kubernetes, Docker) and microservices concepts required
- Expert experience with Hashicorp products (Nomad, Vault, Terraform) in a production environment
Benefits
- Healthcare for family, including dental and vision
- Competitive compensation and 401K
- RSU grants for full-time employees
- ESPP program
- Flexible vacation policy
- Maternity & paternity leave
- MacBook Pro to use for work, plus a generous stipend to personalize your workstation
- Childcare bonus (human children only)
- Fertility treatment and support
- Learning & development program
- Commuter benefits
- Culture that supports a healthy work-life balance