
Job description
Design and optimize large-scale pre-training systems that power Mindbeam’s generative AI models. Build scalable pre-training pipelines for foundation models, optimizing throughput and efficiency.
Implement distributed training strategies across GPUs/TPUs and high-performance clusters. Develop monitoring and fault-tolerance systems to ensure reliable large-scale training. Continuously benchmark and tune performance across hardware and software stacks.
You thrive on scale and complexity. You enjoy solving system-level bottlenecks, pushing hardware and software to their limits, and working closely with researchers to accelerate cutting-edge AI development.
Company
Keep exploring
Sign in to see similar jobs
Create a free account to discover roles related to this posting.

Tech, Software & IT Services
Mindbeam AI specializes in delivering cutting-edge AI infrastructure solutions for businesses leveraging generative AI and large language models (LLMs). We provide a comprehensive suite of services including pre-training, fine-tuning, inference, and LLM optimization, all powered by advanced GPU optimization techniques. We empower organizations to efficiently deploy and scale AI applications with a focus on accelerated computing and cloud solutions (AWS, NVIDIA). Mindbeam AI offers both direct service delivery and expert consulting, helping clients maximize performance, reduce costs, and implement sustainable, energy-efficient AI strategies.