Bright Vision Technologies is a software development company seeking an AI Infrastructure Engineer to design, build, and operate the platform layer that powers large-scale AI training and inference workloads.
Requirements
- Bachelor’s or Master’s degree in Computer Science or a related field.
- Six or more years of experience in infrastructure, platform, or HPC engineering.
- Hands-on experience operating GPU clusters or large-scale ML training infrastructure.
- Strong proficiency in Python and at least one systems language such as Go or C++.
- Deep understanding of distributed training, accelerator architectures, and collective communication.
- Experience with Kubernetes, Slurm, Ray, or similar scheduling systems for ML workloads.
- Strong understanding of Linux internals, networking, and high-performance storage.
- Experience with at least one major cloud provider’s ML infrastructure offerings.
- Strong software engineering practices including testing, CI/CD, and code review.
- Excellent communication and cross-functional collaboration skills.
Benefits
- Competitive base salary commensurate with experience, plus benefits.
- Long-term, multi-year, aligned to the Bright Vision SOW delivery roadmap.