The organization operates one of the largest GPU infrastructures in the world and is looking for a Senior Site Reliability Engineer to ensure fault-tolerance, scale, and uninterrupted operations for the service.
Requirements
- Solid experience with programming languages (like Go, Python, or C++), beyond scripting
- Good understanding of classic algorithms and data structures
- Commercial experience with, and deep understanding of, Unix/Linux systems and network technology
- Solid experience with CI/CD and IaC
- Experience with containerization and configuration management (Ansible, Salt, Terraform, Docker, Kubenetes, Helm)
- Desire to be involved in backend development
- Experience designing, developing, and running high-load distributed systems
- Experience with a variety of cloud platforms
Benefits
- Competitive salary
- Comprehensive benefits package
- Opportunities for professional growth
- Flexible working arrangements
- Dynamic and collaborative work environment