The Data Center Operations Engineer is responsible for supporting, maintaining, and deploying critical data center infrastructure with a strong focus on Linux-based systems, GPU server deployments, and InfiniBand networking.
Requirements
- Provide hands-on operational support for all data center projects, deployments, and repair activities.
- Troubleshoot and resolve operational issues related to Linux servers, GPU platforms, networking, and storage infrastructure.
- Install, configure, test, and maintain server hardware (rack and stack, labeling, HDDs, memory, CPUs, RAID batteries, NICs, etc.).
- Maintain accurate documentation of operational procedures, system configurations, and runbooks.
- Collaborate with global teams across time zones to support operational initiatives and continuous improvement efforts.
- Contribute to process improvement initiatives and ensure adherence to documented policies, processes, and procedures.
- Manage incident tickets, maintaining acceptable ticket loads, and meeting SLAs.
- Ability to perform the essential functions of the role, including lifting, moving, and installing equipment weighing 50 pounds or more, with or without reasonable accommodation.
- Ability to work in data center environments, including raised floors, equipment racks, and confined spaces.
- Willingness to work flexible hours, including nights, weekends, and on-call rotations as required.
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Tuition Reimbursement