Join Baseten, a startup revolutionizing AI deployment with cutting-edge inference infrastructure, as a Site Reliability Engineer. You'll build robust systems and processes to ensure infrastructure scalability, reliability, and efficiency. Work closely with users, automate processes, and mentor junior team members.
Requirements
- Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field
- 3+ years of work professional work experience in a fast-paced, high-growth environment
- Extensive experience with Kubernetes
- Experience in building and maintaining scalable infrastructure
- Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi) and CI/CD tooling (e.g., GitHub Actions, GitLab CI, Circle CI, Jenkins)
- Relevant OSS observability experience (Prometheus, ELK stack, Grafana stack, Opentelemetry) is a plus
Benefits
- Competitive compensation package (Unlimited PTO, 401k, covered healthcare premiums)
- A unique opportunity to be part of a rapidly growing startup in one of the most exciting engineering fields of our era
- An inclusive and supportive work culture that fosters learning and growth
- Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities