We are seeking a seasoned Platform / Site Reliability Engineer to lead the evolution and performance of our Kubernetes-based production environment.
Requirements
- Maintain, scale, and enhance our Kubernetes production platform
- Improve the performance, reliability, and availability of production systems
- Ensure comprehensive observability across services, infrastructure, and applications (metrics, logs, traces, alerts)
- Diagnose and tune database and service performance across distributed systems
- Partner with Development teams to assess platform impact of new features and architectural changes
- Identify architecture bottlenecks and contribute to scalable, resilient technical solutions
- Collaborate with Support teams to build and maintain operational runbooks and improve incident response processes
Benefits
- Equal opportunity employer
- Committed to creating a diverse environment