Role Overview

Join Nebius' team as a Senior HPC Cluster Engineer to play a key role in developing our cutting-edge hyperscaler platform. The role involves analyzing, troubleshooting, and improving infrastructure to support new hardware, fine-tuning system performance, and automating fault detection and resolution in a complex system.

What You Will Do

Tune the performance of GPU clusters and InfiniBand networks, analyze and troubleshoot issues related to GPUs and InfiniBand networks, integrate new hardware into the existing infrastructure, and enhance automation systems for proactive monitoring and issue resolution.

Why It Might Be a Fit

We're looking for a Senior HPC Cluster Engineer with 5+ years of professional experience in system-level software development, 3+ years of hands-on experience with Linux systems, and in-depth understanding of server architecture, including PCIe devices, NICs, Linux OS/Kernel, and high-performance computing (HPC) systems.

Requirements

5+ years of professional experience in system-level software development (focused on performance optimization, low-level programming)
3+ years of hands-on experience with Linux systems (administration, troubleshooting, and performance tuning)
In-depth understanding of server architecture, including PCIe devices, NICs, Linux OS/Kernel, and high-performance computing (HPC) systems
Strong proficiency in one or more performance-oriented programming languages (C/C++, Go, Python)

Benefits

Competitive compensation
Career growth and learning opportunities
Flexibility and work-life balance
Collaborative and innovative culture
Opportunity to work on impactful AI projects
International environment and talented teams

Role Overview

Why It Might Be a Fit

Requirements

5+ years of professional experience in system-level software development (focused on performance optimization, low-level programming)

3+ years of hands-on experience with Linux systems (administration, troubleshooting, and performance tuning)

In-depth understanding of server architecture, including PCIe devices, NICs, Linux OS/Kernel, and high-performance computing (HPC) systems

Strong proficiency in one or more performance-oriented programming languages (C/C++, Go, Python)

Senior HPC Cluster Engineer

About the role

Role Overview

What You Will Do

Why It Might Be a Fit

Requirements

Benefits

Products

Use Cases

Insights

Resources

Browse Jobs

Company

Senior HPC Cluster Engineer

About the role

Role Overview

What You Will Do

Why It Might Be a Fit

Requirements

Benefits

Similar jobs