We are seeking a highly experienced Site Reliability Engineering (SRE) Manager to lead and scale the team responsible for our core Observability Platform.
Requirements
- 3+ years of engineering management experience leading SRE, Platform, or Observability-focused teams.
- 5+ years of hands-on experience in Site Reliability Engineering, DevOps, or Software Engineering
- Deep domain expertise in designing, building, and operating high-scale observability platforms (metrics, logging, and tracing)
- Strong technical background with experience in Cloud platforms (AWS, GCP, or Azure) and Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible).
- Programming/scripting languages (Python, Go, or similar) for automation and tooling development.
- Proven ability to set technical direction, influence technical strategy, and drive architectural decisions in a complex environment.
- Excellent communication, interpersonal, and stakeholder management skills.