Fathom is hiring an AI Engineer - Model Performance to optimize model inference speed, cost, and reliability, and build fine-tuning infrastructure for the AI team. The role involves optimizing real systems serving millions of meetings, choosing between quantization trade-offs, debugging speculative decoding, and figuring out why one GPU family's tail latency explodes at high concurrency while another stays stable.
Requirements
- Deep experience with LLM serving frameworks (vLLM, SGLang, TensorRT-LLM, or similar) — not just deploying them, but tuning them: attention backends, scheduling strategies, CUDA graph warmup, prefix caching
- Hands-on quantization experience — you've gone beyond "apply FP8 and hope." You understand weight vs activation quantization, per-channel vs per-tensor scaling, and when dynamic quantization introduces more overhead than it saves
- Production fine-tuning experience — LoRA/QLoRA SFT, familiarity with training frameworks (ms-swift, Axolotl, torchtune, or similar), understanding of data formatting, learning rate schedules, and how to diagnose training failures
- Strong Python. You'll write serving infrastructure, benchmarking harnesses, and training pipelines — not notebooks
- Comfort with GPU profiling and performance analysis. You should be able to look at a benchmark result and know whether the bottleneck is compute, memory bandwidth, or scheduling overhead
- Strong signal: Cost modeling for GPU infrastructure — you've had to choose between GPU types and justify the tradeoff
- Experience with multimodal models (audio/vision encoders + LLM decoders)
- Experience with Modal, Ray Serve, or similar serverless GPU platforms
- Understanding of audio processing (codecs, chunking, sample rates)
- Experience building internal tooling that other engineers use — this role succeeds when the rest of the team ships faster
Benefits
- Competitive compensation and benefits
- A dynamic and collaborative engineering team
- A supportive environment that encourages innovation and personal growth
- Opportunity for impact
- Startup experience