We are seeking highly skilled software engineers to join NVIDIA and build AI inference systems that serve large-scale models with extreme efficiency. You'll architect and implement high-performance inference stacks, optimize GPU kernels and compilers, and scale workloads across multi-GPU, multi-node, and multi-cloud environments.
Requirements
- Bachelor's degree (or equivalent experience) in Computer Science (CS), Computer Engineering (CE), or Software Engineering (SE) with 7+ years of experience; alternatively, Master's degree in CS/CE/SE with 5+ years of experience; or PhD degree with the thesis and top-tier publications in ML Systems, GPU architecture, or high-performance computing.
- Strong programming skills in Python and C/C++; experience with Go or Rust is a plus; solid CS fundamentals: algorithms & data structures, operating systems, computer architecture, parallel programming, distributed systems, deep learning theories.
- Knowledgeable and passionate about performance engineering in ML frameworks (e.g., PyTorch) and inference engines (e.g., vLLM and SGLang).
- Familiarity with GPU programming and performance: CUDA, memory hierarchy, streams, NCCL; proficiency with profiling/debug tools (e.g., Nsight Systems/Compute).
- Experience with containers and orchestration (Docker, Kubernetes, Slurm); familiarity with Linux namespaces and cgroups.
- Excellent debugging, problem-solving, and communication skills; ability to excel in a fast-paced, multi-functional setting.
Benefits
- Eligible for equity and benefits