Role Overview

NVIDIA is looking for a Software Engineer to work on bring-up, triage, benchmarking, analysis, and optimization of distributed training and inference workloads across NVIDIA GPU platforms at the largest scales.

What You Will Do

Bring up, validate, and debug large-scale AI clusters, infrastructure, and end-to-end workloads. Bring up, tune, and benchmark AI pre-training, post-training, and inference workloads using PyTorch, NeMo / Megatron, TensorRT-LLM, and adjacent NVIDIA AI software stacks.

Why It Might Be a Fit

3+ years of experience developing software for AI, HPC, or systems-level applications. Hands-on experience with multi-GPU or multi-node workloads and CUDA-aware distributed execution. Excellent analytical, debugging, and communication skills, and a collaborative approach across teams.

Requirements

Bachelor’s or Master’s in Computer Science or a related technical field (or equivalent experience)
3+ years of experience developing software for AI, HPC, or systems-level applications
Hands-on experience with multi-GPU or multi-node workloads and CUDA-aware distributed execution
Backgroun with debugging and scaling distributed systems
Experience debugging and triaging AI applications across the full stack, from the application level toward the hardware
Experience operating workloads in scheduled, containerized cluster environments
Strong Python and C/C++ programming skills

Benefits

equity
benefits

Role Overview

What You Will Do

Why It Might Be a Fit

Requirements

Bachelor’s or Master’s in Computer Science or a related technical field (or equivalent experience)
3+ years of experience developing software for AI, HPC, or systems-level applications
Hands-on experience with multi-GPU or multi-node workloads and CUDA-aware distributed execution
Backgroun with debugging and scaling distributed systems
Experience debugging and triaging AI applications across the full stack, from the application level toward the hardware
Experience operating workloads in scheduled, containerized cluster environments
Strong Python and C/C++ programming skills

Benefits

equity
benefits

Software Engineer, DGX Cloud AI Infrastructure

About the role

Role Overview

What You Will Do

Why It Might Be a Fit

Requirements

Benefits

Similar jobs

Products

Use Cases

Insights

Resources

Browse Jobs

Company

Software Engineer, DGX Cloud AI Infrastructure

About the role

Role Overview

What You Will Do

Why It Might Be a Fit

Requirements

Benefits

Similar jobs

About NVIDIA

NVIDIA