We are looking for an experienced Observability / Reliability Engineer to lead our monitoring, logging and observability strategy and implementation across various projects.
Requirements
- 7+ years of experience in observability engineering, SRE, infrastructure monitoring, or related reliability-focused roles with hands-on implementation exposure.
- Strong understanding of observability fundamentals, including metrics, logs, traces, telemetry correlation, and performance analysis.
- Experience with modern observability tools and stacks such as Prometheus, Grafana, OpenTelemetry, Elastic Stack, Datadog, Splunk, New Relic, or equivalent platforms.
- Practical experience improving alert quality, monitoring strategy, or service visibility in production environments.
- Familiarity with service reliability concepts including service journeys, SLIs, SLOs, alert thresholds, and incident detection approaches.
- Strong systems thinking and troubleshooting skills, with the ability to translate operational problems into scalable technical solutions.
- Collaborative approach with clear communications across technical and non-technical stakeholders.
- Experience working with external vendors or cross-functional teams
Benefits
- Promote a learning culture and encourage you to grow and learn
- A wholly-owned subsidiary of GovTech