We’re looking for a Research Scientist with deep expertise in training and fine-tuning large Vision-Language and Language Models (VLMs / LLMs) for downstream multimodal tasks.

Requirements

PhD (or equivalent experience) in Machine Learning, Computer Vision, NLP, Robotics, or Computer Graphics.
Proven track record in fine-tuning or training large-scale VLMs / LLMs for real-world downstream tasks.
Strong engineering mindset — you can design, debug, and scale training systems end-to-end.
Deep understanding of multimodal alignment and representation learning (vision–language fusion, CLIP-style pre-training, retrieval-augmented generation).
Familiarity with recent trends, including video-language and long-context VLMs, spatio-temporal grounding, agentic multimodal reasoning, and Mixture-of-Experts (MoE) fine-tuning.
Awareness of 3D-aware multimodal models — using NeRFs, Gaussian splatting, or differentiable renderers for grounded reasoning and 3D scene understanding.
Hands-on experience with PyTorch / DeepSpeed / Ray and distributed or mixed-precision training.
Excellent communication skills and a collaborative mindset.

We’re looking for a Research Scientist with deep expertise in training and fine-tuning large Vision-Language and Language Models (VLMs / LLMs) for downstream multimodal tasks.

Requirements

PhD (or equivalent experience) in Machine Learning, Computer Vision, NLP, Robotics, or Computer Graphics.
Proven track record in fine-tuning or training large-scale VLMs / LLMs for real-world downstream tasks.
Strong engineering mindset — you can design, debug, and scale training systems end-to-end.
Deep understanding of multimodal alignment and representation learning (vision–language fusion, CLIP-style pre-training, retrieval-augmented generation).
Familiarity with recent trends, including video-language and long-context VLMs, spatio-temporal grounding, agentic multimodal reasoning, and Mixture-of-Experts (MoE) fine-tuning.
Awareness of 3D-aware multimodal models — using NeRFs, Gaussian splatting, or differentiable renderers for grounded reasoning and 3D scene understanding.
Hands-on experience with PyTorch / DeepSpeed / Ray and distributed or mixed-precision training.
Excellent communication skills and a collaborative mindset.

Research Scientist – VLM Generalist

About the Company

Job Description

Requirements

Similar Jobs

Research Scientist – VLM Generalist

Research Scientist – Controlled 3D Generation

Research Scientist / Engineer – Multimodal Capabilities

Research Scientist – VLM Generalist

About the Company

Job Description

Requirements

Similar Jobs

Research Scientist – VLM Generalist

Research Scientist – Controlled 3D Generation

Research Scientist / Engineer – Multimodal Capabilities

Job Details