At CV-Library, we are looking for a Site Reliability Engineer to help design, operate and improve the reliability of our high-traffic web platforms and cloud-native services running across AWS and hybrid environments.
Requirements
- Manage and optimise AWS infrastructure including EC2, EKS, RDS, Aurora, S3, VPC, IAM, Route53 and CloudWatch
- Improve system reliability, availability and resilience across production services
- Define and improve operational practices including monitoring, alerting and incident management
- Drive improvements in observability, metrics, logging and tracing
- Participate in incident response and post-incident reviews, helping prevent recurrence
- Contribute to capacity planning and performance optimisation
- Operate and improve containerised workloads using Docker and Kubernetes
- Maintain and evolve infrastructure supporting high-traffic production services
- Automate operational workflows and reduce manual toil
- Implement and improve CI/CD pipelines
- Build and maintain Infrastructure as Code
- Work closely with developers to improve deployment processes and system operability
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Four Day Work Week
- Generous Parental Leave
- Tuition Reimbursement
- Relocation Assistance