Lead Solution Architect to work on Kubernetes Platforms, HPC Experience, Infra & Hardware Stack, and AI Platform Enablement. Average 2 days per week from an HPE office.

Requirements

Proficiency in container orchestration platforms, especially Red Hat OpenShift, SUSE Rancher, and CNCF Kubernetes.
Experience with GPU-accelerated workloads and tools like NVIDIA GPU Operator and DCGM.
Proven ability to integrate Kubernetes with AI/ML workloads and GPU infrastructure in hybrid or private cloud environments.
Experience architecting HPC clusters, including GPU/compute nodes and HPC storage technologies (e.g., Lustre, WEKA, Parallel Filesystems).
Understanding of high-speed networking (e.g., InfiniBand, Mellanox, RoCE)
Experience with HPC cluster management tools such as HPE Cluster Management (HPCM) or NVIDIA Base Command Manager.
Familiarity with HPC workload schedulers like Slurm or Altair PBS Pro.
Strong background in Linux system administration (RHEL / SLES /Ubuntu) and virtualization (KVM or OpenShift Virtualization).
Primary focus on HPE servers and storage; experience with Dell, Lenovo, Supermicro, NVIDIA DGX/HGX platforms is acceptable.
Solid understanding of compute, storage, and networking fundamentals in enterprise environments.
Experience deploying and supporting NVIDIA AI Enterprise and related AI/ML frameworks.
Familiarity with DevOps/MLOps practices, including CI/CD, Infrastructure as Code (IaC), and cloud-native security.
Demonstrated success in infrastructure project delivery, including platform build-outs for AI/ML and HPC workloads.
Ability to collaborate with cross-functional teams and align technical solutions with business goals.
Ability to develop solutions that enhance the availability, performance, maintainability and agility of a particular customer's enterprise.

Benefits

Health & Wellbeing
Personal & Professional Development
Unconditional Inclusion

Requirements

Proficiency in container orchestration platforms, especially Red Hat OpenShift, SUSE Rancher, and CNCF Kubernetes.

Experience with GPU-accelerated workloads and tools like NVIDIA GPU Operator and DCGM.

Proven ability to integrate Kubernetes with AI/ML workloads and GPU infrastructure in hybrid or private cloud environments.

Experience architecting HPC clusters, including GPU/compute nodes and HPC storage technologies (e.g., Lustre, WEKA, Parallel Filesystems).

Understanding of high-speed networking (e.g., InfiniBand, Mellanox, RoCE)

Experience with HPC cluster management tools such as HPE Cluster Management (HPCM) or NVIDIA Base Command Manager.

Familiarity with HPC workload schedulers like Slurm or Altair PBS Pro.

Strong background in Linux system administration (RHEL / SLES /Ubuntu) and virtualization (KVM or OpenShift Virtualization).

Primary focus on HPE servers and storage; experience with Dell, Lenovo, Supermicro, NVIDIA DGX/HGX platforms is acceptable.

Solid understanding of compute, storage, and networking fundamentals in enterprise environments.

Experience deploying and supporting NVIDIA AI Enterprise and related AI/ML frameworks.

Familiarity with DevOps/MLOps practices, including CI/CD, Infrastructure as Code (IaC), and cloud-native security.

Demonstrated success in infrastructure project delivery, including platform build-outs for AI/ML and HPC workloads.

Ability to collaborate with cross-functional teams and align technical solutions with business goals.

Ability to develop solutions that enhance the availability, performance, maintainability and agility of a particular customer's enterprise.

Lead Solution Architect

About the role

Requirements

Benefits

Products

Use Cases

Insights

Resources

Browse Jobs

Company

Lead Solution Architect

About the role

Requirements

Benefits

Similar jobs