We are seeking a Lead Software Engineer to build the infrastructure that makes transparent and accessible AI breakthroughs possible. You will bridge the gap between our researchers, our orchestration platform (Beaker) and our GPU clusters.
Requirements
- 10+ years of professional experience developing business-critical software and operating large-scale compute infrastructure.
- Proficiency in Go and/or Python preferred.
- Bachelor’s degree in related field; relevant advanced degree may substitute for equivalent years of technical work experience
- Deep Linux Expertise: Expert-level knowledge of Linux internals, and container runtimes like Docker.
- Distributed Systems Mastery: A proven track record of designing, debugging, and optimizing high-scale distributed systems and databases.
- HPC Foundations: Applied experience with workload schedulers (like Kubernetes or Slurm) and high-performance networking (NCCL and InfiniBand).
- Cloud & Hardware Hybridity: Familiarity with the nuances of on-prem GPU cluster management and cloud infrastructure (GCP, AWS).
- Communication: Exceptional writing skills and the ability to drive consensus across diverse groups of researchers and engineers.
- A principled approach to engineering: you care about how systems are built and are excited by the unique constraints and freedoms of a non-profit research environment.
Benefits
- Generous paid vacation and sick leave
- Up to 20 vacation days per year
- Up to 12 paid holidays throughout the calendar year
- Health savings account plan
- Healthcare reimbursement arrangement plan
- Health care and dependent care flexible spending account plans
- 401k plan
- Annual bonuses
- Long-term incentive plan
- Fitness and wellbeing expenses
- Commuting or internet expenses assistance