Cloudbeds is seeking a Senior Site Reliability Engineer to join their remote team. As a guardian of the company's platform reliability and performance, the successful candidate will design and implement scalable AWS cloud solutions, maintain and support Kubernetes clusters, and collaborate with development and security teams to ensure system reliability and security.
Requirements
- Design and implement reliable and scalable AWS architecture
- Maintain and support highly loaded Kubernetes (EKS) clusters and infrastructure-related components
- Support the CICD process with ArgoCD and GitOps
- Automate the platform deployments with Terraform infrastructure-as-code
- Develop and continuously improve product Observability and Monitoring systems
- Respond and participate with Incident Management and Root Cause Analysis
- Optimize system performance and troubleshoot issues
- Collaborate with development teams to establish monitoring best practices
- Collaborate with security teams to implement and maintain security best practices
- Infrastructure support rotation providing guidance to other engineering teams
Benefits
- Remote First, Remote Always
- PTO in accordance with local labor requirements
- Monthly Wellness Fridays
- Full Paid Parental Leave
- Home office stipend based on country of residency
- Professional development courses in Cloudbeds University
- Access to professional development, including manager training, upskilling and knowledge transfer