We are seeking a Lead Site Reliability Engineer (Infrastructure) to join our fast-moving VSaaS engineering organization. This role carries responsibility for technical leadership and operational execution of the Infrastructure SRE team.
Requirements
- 10+ years of experience in site reliability engineering, infrastructure, or systems engineering
- Strong hands-on experience designing and building automation and operational tooling using Golang and/or Python
- Advanced expertise in cloud-native and IaaS architectures, distributed systems, and container orchestration in production environments
- Deep understanding of SRE and DevOps principles, including incident management, SLA/SLO ownership, automation, reliability engineering practices and leading incident response with post-incident analysis and preventive improvements
- Strong experience with CI/CD pipelines, GitOps workflows, release tooling, and modern cloud-native infrastructure practices, ensuring reliable and traceable software and infrastructure changes
- Hands-on experience operating Docker and Kubernetes environments, observability platforms (logging, monitoring, alerting), and SQL/NoSQL databases (e.g., Postgres, MongoDB, Graph DB)
Benefits
- Medical/dental benefits
- FSA or HSA
- 401k with 6% Safe Harbor employer match
- Paid parental leave
- Generous PTO (20 days' vacation, 10 days paid sick time, and 12 company holidays)
- Fully paid Short Term disability policy
- Fully paid Long Term disability policy
- Life Insurance