The Core Network Engineering team at OpenAI owns the end-to-end networking stack that connects OpenAI’s compute infrastructure. We’re looking for engineers to help build and operate the networking foundation behind OpenAI’s frontier AI systems.
Requirements
- Design, build, and operate networking systems that support large-scale AI training and inference infrastructure
- Improve performance, reliability, and scalability across host networking, datacenter fabrics, and WAN systems
- Develop automation for provisioning, configuration management, validation, upgrades, and lifecycle management of networking infrastructure
- Build tooling and observability systems for network health, performance analysis, debugging, and automated remediation
- Optimize network performance across technologies such as RDMA, RoCE, InfiniBand, Ethernet, and high-performance GPU interconnects
- Define and operationalize networking protocols, readiness criteria, and continuous validation systems
- Partner closely with compute, storage, hardware, and infrastructure teams to ensure networking scales predictably with fleet growth
- Contribute to architecture decisions around topology design, capacity planning, failure domains, and network reliability
- Diagnose complex distributed systems and networking issues across large heterogeneous compute environments