Everbridge is seeking a Staff Platform Site Reliability Specialist to own, operate, and evolve our enterprise observability platform.
Requirements
- Observability Platform Ownership
- Head the design, operation, and evolution of Everbridge’s observability stack
- Build and maintain a highly available, scalable observability platform
- Standardize instrumentation, dashboards, alerts, and SLOs
- Support incident response, root cause analysis, and capacity planning
- Grafana Stack & Telemetry
- Operate and scale Grafana and technology
- Grafana Loki (logs)
- Grafana Mimir (metrics)
- Grafana Tempo (tracing)
- Grafana Alerting
- Kubernetes
- Maintain reliability and security of EKS clusters running observability
- Manage cluster lifecycle and upgrades
- Infrastructure as Code & Automation
- Terraform for infrastructure provisioning
- HashiCorp Packer
- Gitlab CI/CD at Scale
Benefits
- healthcare
- dental care
- mental health benefits
- disability income benefits
- life and AD&D insurance
- retirement savings plan with employer match
- paid time off