At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. As a Lead Site Reliability Engineer, you will set technical direction and lead reliability strategy for Klaviyo’s most critical platforms.
Requirements
- Set the technical vision and long-term strategy for reliability, availability, and operational excellence across critical platforms
- Lead the design, implementation, and evolution of foundational, security-critical services with strong guarantees around availability, scalability, latency, and fault tolerance
- Drive adoption of SRE best practices across engineering teams, including SLIs, SLOs, error budgets, and reliability-based decision making
- Identify systemic reliability risks and architectural bottlenecks, and lead cross-team initiatives to address them with durable, preventative solutions
- Apply software engineering principles to automate infrastructure, eliminate operational toil, and improve system reliability at scale
- Own and continuously improve observability, alerting, and incident response practices to reduce mean time to detection and recovery
- Guide on-call strategy and operational processes to ensure sustainability, automation, and healthy operational load
- Perform and lead quantitative analysis around system behavior, capacity planning, scaling limits, and performance characteristics
- Partner closely with product, platform, and security leaders to influence system architecture early and ensure reliability is built in from the start
- Lead incident response for high-severity events, driving effective mitigation, communication, and follow-up
- Mentor senior and mid-level engineers, raising the bar for technical quality, operational maturity, and reliability culture across the organization
- Review and influence technical designs, platform APIs, operational runbooks, and system documentation at an organizational level
Benefits
- Generous Paid Time Off
- 401k Matching
- Tuition Reimbursement
- Relocation Assistance