Okta is seeking a Senior Site Reliability Engineer to own and evolve our observability ecosystem. The ideal candidate has a strong SRE mindset, minimum 4+ years of experience in SRE, DevOps, or Systems Engineering, and expertise in tools like Terraform, Go, Python, or Ruby. The role involves designing, building, and maintaining scalable observability infrastructure, optimizing telemetry data collection and storage, and leading incident response efforts.
Requirements
- Automated Infrastructure: Design, build, and maintain scalable observability infrastructure using tools like Terraform.
- Pipeline Engineering: Optimize the collection, processing, and storage of telemetry data (Metrics, Logs, Traces) to ensure high reliability and low latency.
- Incident Response: Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and 'observability-driven development.'
- Automation: Eliminate 'toil' by automating the deployment and scaling of observability agents and collectors.
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Visa Sponsorship
- Four Day Work Week
- Generous Parental Leave
- Tuition Reimbursement
- Relocation Assistance