Zefr is seeking a Manager of Machine Learning Operations to lead the ML Ops team and drive the infrastructure, tooling, and processes that enable machine learning systems to operate at scale. The role involves overseeing the deployment, monitoring, and optimization of ML models, leading a team of engineers, and collaborating with ML Engineers and Data Scientists.
Requirements
- Lead, mentor, and grow a team of Machine Learning Engineers
- Design and implement scalable ML infrastructure for model training, deployment, and serving
- Establish and enforce best practices for ML model lifecycle management
- Develop and maintain CI/CD pipelines for machine learning workflows
- Optimize model inference performance and reduce latency/cost across production systems
- Collaborate with ML Engineers and Data Scientists to productionize models efficiently
- Implement robust monitoring, alerting, and observability solutions for ML systems
- Drive technical decisions on ML Ops tooling, infrastructure, and architecture
- Ensure high availability and reliability of ML services at scale
- Manage project timelines, priorities, and resource allocation for the ML Ops team
Benefits
- Flexible PTO
- Medical, dental, and vision insurance with FSA options
- Company-paid life insurance
- Paid parental leave
- 401(k) with company match
- Professional development opportunities
- 14 paid holidays off
- Flexible hybrid work schedule
- Summer Fridays
- In-office lunches and lots of free food
- Optional in-person and virtual events