Reddit is a community of communities, and this job is for a Senior Machine Learning Systems Engineer to lead development of a platform for large scale ML models.
Requirements
- 5+ years of experience in ML infrastructure, including model training and model deployments
- Hands-on experience with ML optimization, including memory and GPU profiling
- Deep experience with cloud-based technologies for supporting an ML platform
- Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries
- Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc.
- Deep experience working with distributed training frameworks, including Ray and Kubernetes
- Strong focus on scalability, reliability, performance, and ease of use
- Strong organizational & communication skills
Benefits
- 401(k) program with employer match
- Medical, dental, and vision insurance
- Generous time off for vacation
- Parental leave