We're looking for a Senior Software Engineer to work at the frontier of large-scale LLM serving, partnering directly with customers to unlock the full performance potential of NVIDIA's inference stack. As a Senior Software Engineer, you'll combine deep systems knowledge with hands-on customer engagement, profiling real deployments, benchmarking across GPU clusters, and turning insights into improvements that ripple across the open-source ecosystem.
Requirements
- Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or equivalent experience.
- 5+ years of industry experience building and operating complex, production-grade software systems, with strong instincts for how systems behave at scale.
- Hands-on experience deploying and operating LLM inference workloads — particularly with vLLM — including configuration, optimization, and debugging in real-world environments.
- Proficiency with container orchestration (Kubernetes) and HPC scheduling (Slurm) for running GPU-accelerated workloads.
- Solid understanding of LLM serving fundamentals: batching strategies (continuous batching, chunked prefill), KV cache management, and tensor/pipeline parallelism.
- Familiarity with GPU performance analysis: memory hierarchy, utilization, roofline modeling, and profiling with Nsight Systems or Nsight Compute.
- Strong written and verbal communication skills, with the ability to present technical findings clearly to both engineering teams and leadership — and to navigate ambiguous, open-ended customer problems.
Benefits
- Comprehensive benefits package
- Highly competitive salaries
- Eligibility for equity