We're looking for a Senior Software Engineer to work at the frontier of large-scale LLM serving, partnering directly with customers to unlock the full performance potential of NVIDIA's inference stack. As a Senior Software Engineer, you'll combine deep systems knowledge with hands-on customer engagement, profiling real deployments, benchmarking across GPU clusters, and turning insights into improvements that ripple across the open-source ecosystem.

Requirements

Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or equivalent experience.
5+ years of industry experience building and operating complex, production-grade software systems, with strong instincts for how systems behave at scale.
Hands-on experience deploying and operating LLM inference workloads — particularly with vLLM — including configuration, optimization, and debugging in real-world environments.
Proficiency with container orchestration (Kubernetes) and HPC scheduling (Slurm) for running GPU-accelerated workloads.
Solid understanding of LLM serving fundamentals: batching strategies (continuous batching, chunked prefill), KV cache management, and tensor/pipeline parallelism.
Familiarity with GPU performance analysis: memory hierarchy, utilization, roofline modeling, and profiling with Nsight Systems or Nsight Compute.
Strong written and verbal communication skills, with the ability to present technical findings clearly to both engineering teams and leadership — and to navigate ambiguous, open-ended customer problems.

Benefits

Comprehensive benefits package
Highly competitive salaries
Eligibility for equity

Requirements

Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or equivalent experience.
5+ years of industry experience building and operating complex, production-grade software systems, with strong instincts for how systems behave at scale.
Hands-on experience deploying and operating LLM inference workloads — particularly with vLLM — including configuration, optimization, and debugging in real-world environments.
Proficiency with container orchestration (Kubernetes) and HPC scheduling (Slurm) for running GPU-accelerated workloads.
Solid understanding of LLM serving fundamentals: batching strategies (continuous batching, chunked prefill), KV cache management, and tensor/pipeline parallelism.
Familiarity with GPU performance analysis: memory hierarchy, utilization, roofline modeling, and profiling with Nsight Systems or Nsight Compute.
Strong written and verbal communication skills, with the ability to present technical findings clearly to both engineering teams and leadership — and to navigate ambiguous, open-ended customer problems.

Benefits

Comprehensive benefits package
Highly competitive salaries
Eligibility for equity

Senior Software Engineer, AI Inference

About the role

Requirements

Benefits

Products

Use Cases

Insights

Resources

Browse Jobs

Company

Senior Software Engineer, AI Inference

About the role

Requirements

Benefits

Similar jobs

Senior Software Engineer, AI Inference

Senior Software Engineer - AI Inference

Senior Software Engineer, AI Inference Systems

Senior Software Engineer, AI Inference Systems

Senior Software Engineer, Deep Learning Inference

Senior Software Engineer, AI Inference Systems

About NVIDIA

NVIDIA

Similar jobs

Senior Software Engineer, AI Inference

Senior Software Engineer - AI Inference

Senior Software Engineer, AI Inference Systems

Senior Software Engineer, AI Inference Systems

Senior Software Engineer, Deep Learning Inference

Senior Software Engineer, AI Inference Systems