Join Qube Research & Technologies as a Senior Site Reliability Engineer (SRE) to improve reliability and day-to-day operability for an actively used and growing engineering platform.
Requirements
- Strong practical experience applying Site Reliability Engineering principles in production environments
- Strong Linux systems knowledge
- Experience building and operating containerised workloads (Docker or Podman)
- Strong development experience in Go (preferred) or Python
- Strong experience querying and reasoning about metrics using PromQL
- Hands-on experience with Grafana, including dashboarding and alerting
- Experience deploying and operating centralised logging systems
- Strong Infrastructure as Code experience
- OpenTelemetry experience (metrics, logs, traces)
- Terraform and/or Ansible experience, plus familiarity with CI/CD pipelines
- Kubernetes and cloud-native platform experience
- Exposure to datacentre services and compute/hardware-backed platforms
- AWS infrastructure configuration and deployment experience
- Evidence of reducing operational load and recurring incidents in growing systems
Benefits
- Initiatives and programs to enable employees achieve a healthy work-life balance