
Job description
As an HPC Operations Engineer at Lambda, you will be responsible for remotely deploying and configuring large-scale HPC clusters for AI workloads, troubleshooting and resolving HPC cluster issues, and providing clear and detailed requirements back to other engineering teams.
Remotely deploy and configure large-scale HPC clusters for AI workloads, troubleshoot and resolve HPC cluster issues, and provide clear and detailed requirements back to other engineering teams.
You will be working with a team of experienced engineers, and will have the opportunity to mentor and assist less experienced team members. You will also have the chance to stay up-to-date on the latest HPC/AI technologies and best practices.
Company

Tech, Software & IT Services
Lambda builds and operates a cloud-based superintelligence platform that delivers end-to-end AI infrastructure for deep learning, large-language models, and generative AI. Leveraging high-performance GPUs and distributed training, the company enables rapid development and deployment of foundation models across industries. Its unique focus on scalable AI factories and a unified cloud architecture differentiates Lambda as a leader in accelerating AI adoption. The culture prioritizes collaboration, innovation, and a commitment to responsible AI, making it an attractive environment for talent eager to shape the future of intelligent systems.
Keep exploring

Lambda
Jump Trading Group
Data Systems Analysts, Inc.
Institute of Foundation Models

teKnoluxion Consulting
NVIDIA