Senior Site Reliability Engineer role at Stability AI, responsible for improving and shaping cloud infrastructure, collaborating with engineering teams, and driving innovation and reliability.
Requirements
- Developing and enforcing SRE best practices and standards across the organization
- Architecting and managing scalable systems in AWS and other cloud environments
- Implementing and maintaining infrastructure as code using Terraform
- Setting up and refining monitoring, logging, and alerting systems
- Driving incident management and root cause analysis to improve system reliability
Benefits
- Equal Employment Opportunity