Senior AI Ops Engineer position to architect and deliver automation layer for fast, reproducible, and scalable model development. Job involves implementing experiment tracking, building CI/CD, designing workflow orchestration, and enabling distributed GPU training.
Requirements
- Implement and operate experiment tracking, lineage, and reproducibility standards
- Build CI/CD for ML: tests, packaging, reproducibility checks, policy gates, automated deployment and rollback strategies
- Design workflow orchestration for large-scale ML jobs
- Architect, build, and own automated pipelines for model training, fine-tuning, evaluation, and promotion
- Establish standardized training recipes to reduce time-to-first-experiment and improve consistency across teams
- Enable and optimize distributed GPU training
- Develop evaluation harnesses and automated benchmark suites
Benefits
- 401(K) including company matching
- Employee stock purchase program (ESPP)
- Student debt assistance
- Tuition reimbursement program
- Development and career growth opportunities and programs
- Financial planning benefits
- Wellness benefits including an employee assistance program (EAP)
- Paid time off and paid company holidays
- Family care and bonding leave