We are seeking a Director of AI Infrastructure to oversee the systems that power our research at Ai2, a non-profit research institute at the forefront of open-source AI development. This leader will be responsible for the full lifecycle of our high-performance computing (HPC) environment which includes on-prem GPU clusters and the software orchestration layer that schedules workloads across a hybrid cloud environment.
Requirements
- 12+ years in infrastructure, systems engineering, or HPC, with at least 5 years in a leadership role managing multi-disciplinary engineering teams.
- Direct experience managing large-scale NVIDIA GPU clusters and high-performance networking (InfiniBand/RoCE).
- Strong background in Kubernetes, Slurm, or similar orchestration frameworks, particularly in hybrid-cloud configurations.
- Experience with distributed filesystems (e.g., WEKA, Ceph, Lustre) and cloud storage integration at scale.
- Proficient in Go or Python, with the ability to review architecture and code for our internal tooling.
Benefits
- Medical, dental, vision, and employee assistance program
- Health savings account plan, healthcare reimbursement arrangement plan, and health care and dependent care flexible spending account plans
- Company’s 401k plan
- $125 per month to assist with commuting or internet expenses
- $200 per month for fitness and wellbeing expenses
- Up to ten sick days per year, up to seven personal days per year, up to 20 vacation days per year, and twelve paid holidays throughout the calendar year
- Annual bonuses and participation in the long-term incentive plan