Solace is seeking a Senior Cloud Site Reliability Engineer to lead the daily operations of Solace Cloud, our market-leading SaaS offering, across leading cloud providers and platforms. The ideal candidate will have expertise in cloud networking solutions, site reliability engineering, and incident response, with a strong focus on troubleshooting and debugging in complex cloud-based environments.
Requirements
- Ensuring that the Solace Cloud Services are healthy and reliable, and that SLAs are being met
- Design and implement our infrastructure tooling, observability, and automation
- Contribute to making the production operations more efficient, less error-prone, etc.
- Expert-level knowledge in handling production Incidents in production-grade multi-cloud environments according to industry-standard Incident management process
- Process handling service requests and provisioning by the customers.
- Proven ability to manage customer escalations and drive resolution in mission-critical, high-impact production environments
- Work directly with customers to identify, troubleshoot, and resolve operational issues.
- Expert debugging knowledge in Linux and Kubernetes to detect operational issues.
- Be on-call rotation and provide 24x7 off-hours support
Benefits
- Flexible work hours
- Competitive salary
- Comprehensive benefits package
- Opportunities for professional growth and development
- Hybrid work environment
- Fun and creative work environment