Tyk is on a mission to connect every system in the world. We're looking for a Senior Site Reliability Engineer to optimize, automate, and improve our performance, using insights from massive-scale data in real time.
Requirements
- Lead hands-on maintenance and optimization of our global Cloud platform within SL(A/I/O)s
- Collaborate to shape SRE strategy, then translate into actionable technical plans coordinated through SCRUM
- Identify reliability issues, drive root cause analysis, and implement solutions alongside your squad
- Lead performance tuning and fault finding through analysis of OS and application metrics
- Design and implement automation for common operational tasks and cloud-operations workflows
- Develop proactive alerting, monitoring roadmap, and relevant dashboards; define and track KPIs
- Participate in on-call rotation, ensuring effective incident response and resolution within SLAs
- Conduct blame-free postmortems, document findings, and maintain operational runbooks
- Drive multi-region and multi-cloud platform expansion with focus on scalability and automation
- Optimize infrastructure performance and cost efficiency without impacting service delivery
- Engage with commercial teams on growth plans and translate into technical SRE strategies
- Coordinate penetration testing through provider liaison, technical setup, and environment configuration
- Champion continuous improvement across processes, communication, and team practices
- Model excellence in software design and knowledge sharing
- Plan and execute software upgrades to enhance cloud services
Benefits
- Everyone has unlimited paid holidays
- Employee share scheme
- Generous maternity and paternity leave
- Volunteering Days
- Employee Wellbeing platform