The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems, GPU server deployments, and InfiniBand networking.
Requirements
- Provide hands-on operational support for all data center projects, deployments, and repair activities.
- Troubleshoot and resolve operational issues related to Linux servers, GPU platforms, networking, and storage infrastructure.
- Conduct daily health checks of Linux systems and infrastructure components, proactively identifying and mitigating risks.
- Maintain accurate documentation of operational procedures, system configurations, and runbooks.
- Follow established incident management, escalation procedures, and service-level agreements (SLAs).
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Visa Sponsorship