Join Obsidian as a Site Reliability Engineer to work on reliability challenges across a large-scale distributed SaaS platform, build and improve observability and operational tooling, and gain hands-on experience with cloud infrastructure, Kubernetes, and production systems.
Requirements
- 2–5 years of experience in Site Reliability Engineering, DevOps, Production Engineering, or related roles
- Experience operating and supporting production systems in AWS and/or GCP
- Familiarity with Kubernetes and Helm in cloud-native environments
- Experience with observability and monitoring tools such as Prometheus, Grafana, Datadog, or similar platforms
- Exposure to CI/CD systems such as GitLab CI/CD, GitHub Actions, ArgoCD, or equivalent
- Strong troubleshooting and debugging skills across distributed systems and microservices
- Experience writing automation or infrastructure tooling using scripting or programming languages
- Strong systems thinking and a collaborative engineering mindset
Benefits
- Competitive compensation with equity and 401k
- Comprehensive healthcare with dental and vision coverage
- Flexible paid time off and paid holiday time off
- 12 weeks of new parent or family leave
- Personal and professional development resources