Sieve is an AI research lab focused on video data. We're hiring a Reliability Engineer to design and validate infrastructure powering PB-scale workloads, build monitoring and alerting platforms, and improve cloud and data security.
Requirements
- 3+ years building internal infrastructure at scale
- Experience on-call for Sev 0 / Sev 1 production incidents (L3 preferred)
- Strong cloud experience (GCP, AWS, Oracle, Cloudflare, etc.)
- Deep Infrastructure-as-Code experience (Terraform preferred)
- Familiarity with Argo, Helm, Kustomize, or similar deployment tools
- Experience operating observability systems (Prometheus, OTel, VictoriaMetrics)
- Backend fundamentals in Python, Go, Rust, or C++
- Strong networking + security intuition, including SSO implementation
- High ownership mindset over critical systems
Benefits
- 401k
- Full Health Insurance
- Breakfast, Lunch, and Dinner covered and your choice of snacks
- Ubers covered home