NVIDIA is seeking a Senior Software Engineer, NCCL and CUDA specialization to join their Cloud Service Provider (CSP) Engagements team, focusing on ML software stack functionality and performance for datacenter products such as GB300 and Vera Rubin.
Requirements
- 8+ years of system software validation experience
- BS or MS in Computer Engineering, Computer Science, or related field (or equivalent experience)
- Familiarity with containers, cloud provisioning and scheduling tools such as Docker, Kubernetes, SLURM, and Ansible
- Excellent C/C++ programming and debugging skills, with experience in CUDA development
- Good exposure to PCIe and NVLINK
- Deep understanding of operating systems and data-center system architecture
- Knowledge of high-performance networking like InfiniBand, and RoCE
- Proficient understanding of compute, networking and cloud deployment, specifically on bare-metal and VMs
- Strong software architecture experience
- Experience with deep learning workloads training and inferencing
- Experience conducting performance benchmarking and developing tooling on HPC clusters
Benefits
- Eligible for equity
- Benefits offered