Lead Solution Architect to work on Kubernetes Platforms, HPC Experience, Infra & Hardware Stack, and AI Platform Enablement. Average 2 days per week from an HPE office.
Requirements
- Proficiency in container orchestration platforms, especially Red Hat OpenShift, SUSE Rancher, and CNCF Kubernetes.
- Experience with GPU-accelerated workloads and tools like NVIDIA GPU Operator and DCGM.
- Proven ability to integrate Kubernetes with AI/ML workloads and GPU infrastructure in hybrid or private cloud environments.
- Experience architecting HPC clusters, including GPU/compute nodes and HPC storage technologies (e.g., Lustre, WEKA, Parallel Filesystems).
- Understanding of high-speed networking (e.g., InfiniBand, Mellanox, RoCE)
- Experience with HPC cluster management tools such as HPE Cluster Management (HPCM) or NVIDIA Base Command Manager.
- Familiarity with HPC workload schedulers like Slurm or Altair PBS Pro.
- Strong background in Linux system administration (RHEL / SLES /Ubuntu) and virtualization (KVM or OpenShift Virtualization).
- Primary focus on HPE servers and storage; experience with Dell, Lenovo, Supermicro, NVIDIA DGX/HGX platforms is acceptable.
- Solid understanding of compute, storage, and networking fundamentals in enterprise environments.
- Experience deploying and supporting NVIDIA AI Enterprise and related AI/ML frameworks.
- Familiarity with DevOps/MLOps practices, including CI/CD, Infrastructure as Code (IaC), and cloud-native security.
- Demonstrated success in infrastructure project delivery, including platform build-outs for AI/ML and HPC workloads.
- Ability to collaborate with cross-functional teams and align technical solutions with business goals.
- Ability to develop solutions that enhance the availability, performance, maintainability and agility of a particular customer's enterprise.
Benefits
- Health & Wellbeing
- Personal & Professional Development
- Unconditional Inclusion