Join a fast-moving AI infrastructure team working on the cutting edge of large-scale ML workloads as a Solutions Architect - AI / ML - Training & GPU infra. Design and validate production-grade distributed training and large-scale inference architectures on large GPU clusters.
Requirements
- Hands-on experience designing and operating production-grade, multi-node GPU workloads for training or inference
- Strong background in distributed deep learning (PyTorch Distributed, DeepSpeed) on GPU clusters
- Deep understanding of GPU architecture and interconnects (H100/A100 class, NVLink, InfiniBand)
- Experience with Kubernetes or Slurm and performance tuning using GPU profiling and monitoring tools
Benefits
- Total compensation up to EU 300k (base + variable), depending on level and experience
- Location: Remote from anywhere in Europe