Join India’s most talent-dense robotics team and push the frontiers of generative and multimodal learning that power our autonomous robots.
Requirements
- Design and train diffusion-based generative models for realistic, high-resolution synthetic data.
- Build compact Vision–Language Models (VLMs) to caption, query and retrieve job-site scenes for downstream perception tasks.
- Develop Vision–Language Action Models (VLA) objectives that link textual work-orders with pixel-level segmentation masks.
- Architect large-scale auto-annotation pipelines that transform unlabeled images / point-clouds into high-quality labels with minimal human input.
- Benchmark model performance on accuracy, latency and memory for deployment on Jetson-class hardware; compress with distillation or LoRA.
- Collaborate with perception and robotics teams to integrate research prototypes into live ROS 2 stacks.