AI Infrastructure Engineer responsible for designing, building, and maintaining scalable and robust infrastructure solutions for AI and machine learning workloads.
Requirements
- Bachelor’s or higher degree in Computer Science, Engineering, or related technical field
- 3+ years of experience in infrastructure engineering, preferably with a focus on AI, machine learning, or high-performance computing environments
- Cloud skills - GCP/OpenShift, Kubernetes (k8s), Docker containers/images
- AI skills – Model training, testing/evaluation, deployment
- ML/LLMOPs
- LLMs and GenAI core skills – how do LLMs work under the hood, inference mechanics of LLMs/GenAI
- Inference scaling, distributed computing, inference benchmarking, inference planning for meeting SLAs/SLOs
- GPUs and how to work with them, distributed workloads handling, autoscaling
- NVIDIA NIMs, Huggingface
- NVIDIA Superpods (HPC, slurm, k8s)
- Monitoring, dashboards for LLM/ML workloads and applications
- AI Application Architecture know-how, end to end flows
- DevOps (CI/CD, argoCD, git, Jenkins etc)
- Languages: Python, SQL