We're hiring an Inference Engineer to advance our mission of building real-time multimodal intelligence. In this role, you'll design and build low latency, scalable, and reliable model inference and serving stack for our cutting edge foundation models.
Requirements
- Strong engineering skills, comfortable navigating complex codebases and monorepos
- An eye for craft and writing clean and maintainable code
- Experience building large-scale distributed systems with high demands on performance, reliability, and observability
- Technical leadership with the ability to execute and deliver zero-to-one results amidst ambiguity
- Experience designing best practices and processes for monitoring and scaling large scale production systems
- Background in or experience working on inference pipelines with machine learning and generative models
- Experience working in CUDA, Triton or similar
Benefits
- Lunch, dinner and snacks at the office
- Fully covered medical, dental, and vision insurance for employees
- 401(k)
- Relocation and immigration support
- Your own personal Yoshi