Cloudbeds is a remote-first company that is transforming the hospitality industry with its AI-powered solutions. The Senior Site Reliability Engineer will be responsible for designing and implementing scalable AWS architecture, maintaining and supporting highly loaded Kubernetes clusters, and collaborating with development teams to establish monitoring best practices. The ideal candidate will have 5+ years of experience in DevOps or SRE, expertise in Kubernetes, AWS, and Observability tools, and excellent communication skills.

Requirements

5+ years of experience as a DevOps or SRE working within the AWS ecosystem.
5+ years of experience with Kubernetes (EKS) and Helm charts.
Experience with designing, building, and supporting CI/CD pipelines with ArgoCD and GitHub actions.
Experience with infrastructure-as-code methodologies with Terraform.
Experience with Observability and Monitoring with Grafana, Prometheus, DataDog, and Cloudwatch.
Experience with Incident Management, full stack troubleshooting, performance analysis and root cause analysis (RCA).
Experience with Web application systems such as Nginx, Ingress controllers, load balancing and Content Delivery Networks.
Experience with Databases (MySQL, PostgreSQL, Aurora) and Middleware technologies (Redis, Memcached and SQS)
Good networking skills with VPC, Security Groups and Network ACLs.
Ability to work remotely and manage your own time in a global team.
Good written and verbal communication in English.
Bachelor’s degree in Computer Science or equivalent experience.

Benefits

PTO in accordance with local labor requirements
Monthly Wellness Fridays
Full Paid Parental Leave
Home office stipend based on country of residency
Professional development courses in Cloudbeds University
Access to professional development, including manager training, upskilling and knowledge transfer.

Requirements

5+ years of experience as a DevOps or SRE working within the AWS ecosystem.

5+ years of experience with Kubernetes (EKS) and Helm charts.

Experience with designing, building, and supporting CI/CD pipelines with ArgoCD and GitHub actions.

Experience with infrastructure-as-code methodologies with Terraform.

Experience with Observability and Monitoring with Grafana, Prometheus, DataDog, and Cloudwatch.

Experience with Incident Management, full stack troubleshooting, performance analysis and root cause analysis (RCA).

Experience with Web application systems such as Nginx, Ingress controllers, load balancing and Content Delivery Networks.

Experience with Databases (MySQL, PostgreSQL, Aurora) and Middleware technologies (Redis, Memcached and SQS)

Good networking skills with VPC, Security Groups and Network ACLs.

Ability to work remotely and manage your own time in a global team.

Good written and verbal communication in English.

Bachelor’s degree in Computer Science or equivalent experience.

Senior Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Job Details

About Third-Party Job Posts