Hewlett Packard Enterprise is seeking an HPC Linux System Administrator to manage and maintain HPC clusters, administer Linux/Unix-based systems, and oversee lab systems used for development, testing, and release validation in HPC environments.
Requirements
- Must be hands-on and able to develop a solid understanding of the Linux system and be able to test the system.
- Manage and maintain HPC clusters, including installation, configuration, and optimization of compute and management nodes.
- Administer Linux/Unix-based systems, ensuring high availability, performance, and security.
- Perform system imaging, software provisioning, and configuration management using tools such as Ansible.
- Conduct hardware troubleshooting and coordinate with vendors or internal teams for hardware repairs and replacements.
- Manage storage systems (NFS, Lustre, GPFS, RAID) and ensure efficient data flow across the HPC environment.
- Monitor system performance, perform regular health checks, and implement preventive maintenance measures.
- Apply OS, firmware, and security updates to maintain system stability and compliance.
- Develop and maintain automation scripts (using Bash, Python, or Ansible) to improve operational efficiency.
- Document system configurations, maintenance procedures, and troubleshooting guides.
- Collaborate with cross-functional teams across geographies to resolve issues, plan upgrades, and support project activities.
- Provides guidance and mentoring to less-experienced staff members.
Benefits
- Health & Wellbeing
- Personal & Professional Development
- Unconditional Inclusion