Cloudbeds is a software company that provides a platform for hospitality properties. As a Senior Site Reliability Engineer, you will be responsible for designing and implementing scalable AWS architecture, maintaining and supporting Kubernetes clusters, and collaborating with development teams to establish monitoring best practices. The company is remote-first and offers a range of benefits, including PTO, wellness Fridays, and professional development courses.
Requirements
- 5+ years of experience as a DevOps or SRE working within the AWS ecosystem
- 5+ years of experience with Kubernetes (EKS) and Helm charts
- Experience with designing, building, and supporting CI/CD pipelines with ArgoCD and GitHub actions
- Experience with infrastructure-as-code methodologies with Terraform
- Experience with Observability and Monitoring with Grafana, Prometheus, DataDog, and Cloudwatch
- Experience with Incident Management, full stack troubleshooting, performance analysis and root cause analysis (RCA)
- Experience with Web application systems such as Nginx, Ingress controllers, load balancing and Content Delivery Networks
- Experience with Databases (MySQL, PostgreSQL, Aurora) and Middleware technologies (Redis, Memcached and SQS)
- Good networking skills with VPC, Security Groups and Network ACLs
- Ability to work remotely and manage your own time in a global team
- Good written and verbal communication in English
- Bachelor’s degree in Computer Science or equivalent experience
Benefits
- Remote First, Remote Always
- PTO in accordance with local labor requirements
- Monthly Wellness Fridays - enjoy an extra long weekend every month
- Full Paid Parental Leave
- Home office stipend based on country of residency
- Professional development courses in Cloudbeds University
- Access to professional development, including manager training, upskilling and knowledge transfer