We are seeking a staff ML engineer to design and evolve the large-scale offline platform for Unity Vector. The role focuses on building reliable infrastructure for generating training datasets, orchestrating ML workflows, and enabling efficient, distributed model training at scale.

Requirements

Design and operate large-scale data pipelines that generate training datasets used for machine learning training and experimentation
Develop infrastructure that supports distributed training workflows using technologies such as Pytorch, Ray Data, and Ray Train
Integrate ML pipelines with workflow orchestration systems (e.g., Flyte, Airflow, or similar) to enable reliable multi-stage training workflows
Improve reproducibility and observability of ML pipelines through dataset validation, monitoring, and automated testing
Optimize performance and resource utilization across distributed compute systems used for data processing and model training
Partner closely with ML engineers to enable efficient large-scale experimentation and model iteration
Lead architectural improvements to ensure our offline ML pipelines remain scalable, reliable, and cost-efficient

Benefits

Comprehensive health, life, and disability insurance
Commutte subsidy
Employee stock ownership
Competitive retirement/pension plans
Generous vacation and personal days
Support for new parents through leave and family-care programs
Office food snacks
Mental Health and Wellbeing programs and support
Employee Resource Groups
Global Employee Assistance Program
Training and development programs
Volunteering and donation matching program

Requirements

Design and operate large-scale data pipelines that generate training datasets used for machine learning training and experimentation
Develop infrastructure that supports distributed training workflows using technologies such as Pytorch, Ray Data, and Ray Train
Integrate ML pipelines with workflow orchestration systems (e.g., Flyte, Airflow, or similar) to enable reliable multi-stage training workflows
Improve reproducibility and observability of ML pipelines through dataset validation, monitoring, and automated testing
Optimize performance and resource utilization across distributed compute systems used for data processing and model training
Partner closely with ML engineers to enable efficient large-scale experimentation and model iteration
Lead architectural improvements to ensure our offline ML pipelines remain scalable, reliable, and cost-efficient

Benefits

Comprehensive health, life, and disability insurance
Commutte subsidy
Employee stock ownership
Competitive retirement/pension plans
Generous vacation and personal days
Support for new parents through leave and family-care programs
Office food snacks
Mental Health and Wellbeing programs and support
Employee Resource Groups
Global Employee Assistance Program
Training and development programs
Volunteering and donation matching program

Staff Machine Learning Engineer, Offline Infrastructure

About the role

Requirements

Benefits

Similar jobs

Products

Use Cases

Insights

Resources

Browse Jobs

Company

Staff Machine Learning Engineer, Offline Infrastructure

About the role

Requirements

Benefits

Similar jobs

About Unity

Unity