We're looking for a Staff ML Engineer to drive the model serving layer for voice workloads, optimizing latency and throughput with hands-on work with inference engines and GPU utilization. This is a foundational hire on a small, high-impact team that shapes how Together serves voice models as the industry moves toward end-to-end speech-to-speech. Own the model serving stack, work with state-of-the-art accelerators, and collaborate with model partners to bring their models to production. Join a small, early-stage team with outsized impact on a fast-growing product area.
Requirements
- 8+ years of ML engineering experience, with a demonstrated focus on model serving, inference optimization, or ML infrastructure at production scale
- Deep, practical expertise in LLM serving engines (vLLM, SGLang, TensorRT-LLM, or equivalent)
- Expert-level Python and PyTorch proficiency, with a strong command of GPU optimization
- Proven system design judgment, strong technical leadership, sharp product intuition for developer tooling
- Proven ability to move fast in ambiguous environments, strong foundation in speech and audio ML (ASR/TTS architectures, audio signal processing)
- Familiarity with audio codec and tokenization schemes, experience training or fine-tuning speech models at scale
Benefits
- Competitive compensation
- Startup equity
- Health insurance
- Other competitive benefits