Join our high-performing Site Reliability Engineering (SRE) team and play a pivotal role in ensuring the reliability, scalability, and performance of the technology powering Man Group's hedge funds.
Requirements
- Strong understanding of SRE principles, including SLIs, SLOs, error budgets, and reliability best practices.
- Hands-on experience with observability and monitoring tools (Prometheus, Grafana, ELK, Loki, or similar).
- Proficiency with automation tools (Ansible, Terraform) and scripting/programming languages (Python, Go, PowerShell).
- Strong troubleshooting and debugging skills across distributed systems, with the ability to diagnose complex production issues under pressure.
- Experience with incident management, on-call rotations, and post-incident reviews.
- Familiarity with Kubernetes and container orchestration.
- A proactive mindset and ability to take ownership of reliability initiatives.
Benefits
- Modern office located in the OfficeX campus with easy access to transport and amenities.
- Hybrid working model
- Competitive compensation package
- 25 days holiday allowance
- Premium Health insurance
- Employee Assistance program
- Referral Bonus
- Additional days off for long service and volunteering
- Multisport card
- Opportunities for professional development including internal tech talks
- Conference attendance, and engagement with the open-source community.