Cloudbeds is a remote-first company that is transforming the hospitality industry with its AI-powered solutions. The Senior Site Reliability Engineer will be responsible for designing and implementing scalable AWS architecture, maintaining and supporting highly loaded Kubernetes clusters, and collaborating with development teams to establish monitoring best practices. The ideal candidate will have 5+ years of experience in DevOps or SRE, expertise in Kubernetes, AWS, and Observability tools, and excellent communication skills.
Requirements
- 5+ years of experience as a DevOps or SRE working within the AWS ecosystem.
- 5+ years of experience with Kubernetes (EKS) and Helm charts.
- Experience with designing, building, and supporting CI/CD pipelines with ArgoCD and GitHub actions.
- Experience with infrastructure-as-code methodologies with Terraform.
- Experience with Observability and Monitoring with Grafana, Prometheus, DataDog, and Cloudwatch.
- Experience with Incident Management, full stack troubleshooting, performance analysis and root cause analysis (RCA).
- Experience with Web application systems such as Nginx, Ingress controllers, load balancing and Content Delivery Networks.
- Experience with Databases (MySQL, PostgreSQL, Aurora) and Middleware technologies (Redis, Memcached and SQS)
- Good networking skills with VPC, Security Groups and Network ACLs.
- Ability to work remotely and manage your own time in a global team.
- Good written and verbal communication in English.
- Bachelor’s degree in Computer Science or equivalent experience.
Benefits
- PTO in accordance with local labor requirements
- Monthly Wellness Fridays
- Full Paid Parental Leave
- Home office stipend based on country of residency
- Professional development courses in Cloudbeds University
- Access to professional development, including manager training, upskilling and knowledge transfer.