We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. We're looking for a distributed ML infrastructure engineer to help extend and scale our training systems.

Requirements

5+ years of experience in ML systems, infra, or distributed training
Experience modifying distributed ML frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod)
Strong software engineering fundamentals (Python, systems design, testing)
Proven multi-node experience (e.g., Slurm, Kubernetes, Ray) and debugging skills (e.g., NCCL/GLOO)
Ability to implement algorithms across GPUs/nodes based on mathematical specs
Experience working on an ML platform/ infrastructure, and/or distributed inference optimization team
Experience with large-scale machine learning workloads (strong ML fundamentals)

Benefits

Comprehensive medical, dental, and vision
401(k) program
Generous PTO, sick leave, and holidays
Paid parental leave and family-friendly benefits
On-site amenities and perks: Complimentary lunch, gym access, and a short walk to the Sunnyvale Caltrain station

Requirements

5+ years of experience in ML systems, infra, or distributed training
Experience modifying distributed ML frameworks (e.g., DeepSpeed, FSDP, FairScale, Horovod)
Strong software engineering fundamentals (Python, systems design, testing)
Proven multi-node experience (e.g., Slurm, Kubernetes, Ray) and debugging skills (e.g., NCCL/GLOO)
Ability to implement algorithms across GPUs/nodes based on mathematical specs
Experience working on an ML platform/ infrastructure, and/or distributed inference optimization team
Experience with large-scale machine learning workloads (strong ML fundamentals)

Benefits

Comprehensive medical, dental, and vision
401(k) program
Generous PTO, sick leave, and holidays
Paid parental leave and family-friendly benefits
On-site amenities and perks: Complimentary lunch, gym access, and a short walk to the Sunnyvale Caltrain station

Machine Learning Infrastructure Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Machine Learning Infrastructure Engineer

Distributed Machine Learning Engineer

Machine Learning Infrastructure Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Machine Learning Infrastructure Engineer

Distributed Machine Learning Engineer

Job Details

About Institute of Foundation Models

Machine Learning Engineer