
Job description
Join NVIDIA's production engineering team to build automation, tooling, and operational systems for large-scale GPU infrastructure. Focus on Kubernetes-based infrastructure, GPU cluster operations, reliability, automation, GitOps, and Day 2 operability.
Build and operate automation for large-scale GPU clusters, develop tools and services for provisioning, validation, upgrades, monitoring, repair, and cluster lifecycle operations.
8+ years of experience in production infrastructure, strong programming skills in Python, Go, or similar, and ability to troubleshoot distributed systems in production.
Company

Tech, Software & IT Services
NVIDIA, founded in 1993, is a leading full‑stack computing company that designs and manufactures GPUs and related technologies. Its products power a wide spectrum of applications—from high‑performance gaming and professional graphics to AI, deep learning, and autonomous vehicle systems—while its data‑center solutions enable large‑scale supercomputing and virtualization. NVIDIA’s pioneering GPU architecture has driven the growth of PC gaming, catalyzed the modern AI era, and continues to shape emerging fields such as the metaverse. The company’s integrated hardware‑software ecosystem delivers unprecedented performance and scalability, positioning NVIDIA as a key enabler of next‑generation computing across automotive, robotics, and enterprise sectors.
Keep exploring

NVIDIA
NVIDIA
NVIDIA
NVIDIA
NVIDIA

NVIDIA