We're looking for a Research Scientist to lead work on post-training data curation for foundation models. You'll design and implement algorithms to generate and improve instruction, preference, and other post-training datasets.
Requirements
- 3+ years of deep learning research experience
- Experience with post-training large vision, language, and multimodal models
- Post-training algorithm development, data curation, and/or synthetic data methods for: Preference-based tuning (e.g. DPO, RLVR, RRHF), Alternative supervision & self-supervision techniques such as self-training and chain-of-thought distillation, SFT (e.g. instruction tuning and demonstration fine-tuning), Post-training tooling development and engineering experience
- Strong understanding of the fundamentals of deep learning
- Sufficient software engineering + deep learning framework (PyTorch or a willingness to learn PyTorch) skills to conduct large-scale research experiments and build production prototypes.
- Demonstrated track record of success in deep learning research, whether papers, tools, or other research artifacts.
Benefits
- 100% covered health benefits (medical, vision, and dental).
- 401(k) plan with a generous 4% company match.
- Unlimited paid time off (PTO) policy.
- Annual $2,000 wellness stipend.
- Annual $1,000 learning and development stipend.
- Daily lunches and snacks are provided in our office!
- Relocation assistance for employees moving to the Bay Area.