Join Stellar Cyber, a fast-growing global leader in cybersecurity, as a Staff Site Reliability Engineer to drive reliability, scalability, and efficiency across production systems. As a senior member of the SRE team, you will operate complex distributed systems and influence architecture, tooling, and best practices to ensure operational excellence.
Requirements
- 5+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles.
- Proven success leading large-scale production systems in cloud environments (AWS, GCP, Azure, or OCI).
- Demonstrated leadership in driving incident response, on-call best practices, and reliability-focused culture.
- Advanced proficiency in Kubernetes administration and troubleshooting.
- Strong experience with production on-call operations and incident management.
- Hands-on experience with observability tools: Prometheus, Grafana, Loki, and Alertmanager.
- Expertise in operating data platforms (Elasticsearch, MongoDB, Spark, Kafka, Redis).
- Proficiency with public cloud services (AWS, Azure, GCP, or OCI).
- Strong programming and automation skills in Python and Bash.
- Deep understanding of Infrastructure as Code (Terraform, Helm).
- Experience with CI/CD pipelines (GitHub Actions, Bitbucket, ArgoCD).
- Strong technical background in distributed systems, databases, networking, and Linux administration.
- Excellent problem-solving, communication, and leadership abilities.
- Bachelor's degree in Computer Science, Engineering, or a related technical field.
Benefits
- Generous Paid Time Off
- 401k Matching
- Tuition Reimbursement
- Relocation Assistance
- Health insurance
- Dental insurance
- Vision insurance