As a Sr. Site Reliability Engineer, you'll be the guardian of our platform's reliability and performance, ensuring millions of hospitality transactions flow seamlessly across the globe. You'll architect and implement scalable AWS cloud solutions that keep the most ambitious hotels running 24/7, while fostering a culture of automation, resilience, and continuous improvement across our engineering teams.

Requirements

Design and implement reliable and scalable AWS architecture to meet the needs of the organization.
Maintain and support highly loaded Kubernetes (EKS) clusters and infrastructure-related components.
Support the CICD process with ArgoCD and GitOps.
Automate the platform deployments with Terraform infrastructure-as-code.
Develop and continuously improve product Observability and Monitoring systems based on the Grafana, Prometheus, DataDog, and Cloudwatch.
Respond and participate with Incident Management and Root Cause Analysis, ensuring minimal impact on services.
Optimize system performance and troubleshoot issues as they arise.
Collaborate with development teams to establish monitoring best practices and ensure systems meet reliability targets.
Collaborate with security teams to implement and maintain security best practices.
Infrastructure support rotation providing guidance to other engineering teams.

Benefits

Remote First, Remote Always
PTO in accordance with local labor requirements
Monthly Wellness Fridays - enjoy an extra long weekend every month
Full Paid Parental Leave
Home office stipend based on country of residency
Professional development courses in Cloudbeds University
Access to professional development, including manager training, upskilling and knowledge transfer.

Requirements

Design and implement reliable and scalable AWS architecture to meet the needs of the organization.

Maintain and support highly loaded Kubernetes (EKS) clusters and infrastructure-related components.

Support the CICD process with ArgoCD and GitOps.

Automate the platform deployments with Terraform infrastructure-as-code.

Develop and continuously improve product Observability and Monitoring systems based on the Grafana, Prometheus, DataDog, and Cloudwatch.

Respond and participate with Incident Management and Root Cause Analysis, ensuring minimal impact on services.

Optimize system performance and troubleshoot issues as they arise.

Collaborate with development teams to establish monitoring best practices and ensure systems meet reliability targets.

Collaborate with security teams to implement and maintain security best practices.

Infrastructure support rotation providing guidance to other engineering teams.

Benefits

Remote First, Remote Always

PTO in accordance with local labor requirements

Monthly Wellness Fridays - enjoy an extra long weekend every month

Full Paid Parental Leave

Home office stipend based on country of residency

Professional development courses in Cloudbeds University

Access to professional development, including manager training, upskilling and knowledge transfer.

Senior Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Job Details

About Third-Party Job Posts