EchoTwin AI is pioneering AI-driven infrastructure intelligence, transforming cities into resilient digital twins. They are seeking a Vision Language Model Engineer to design, develop, and optimize advanced vision-language models for applications like image captioning and visual question answering. The role involves collaborating with cross-functional teams and contributing to the data pipeline.
Requirements
- Design and implement state-of-the-art vision-language models.
- Develop and fine-tune models for image captioning, visual question answering, and text-to-image generation.
- Collaborate with data scientists and software engineers.
- Optimize model performance for accuracy, latency, and scalability.
- Conduct experiments and iterate on architectures.
- Stay up-to-date with the latest research in vision-language models.
Benefits
- Medical, dental, and vision coverage
- Flexible Spending Account (FSA)
- Dependent Care Flexible Spending Account (DCFSA)
- 401(k) with company matching
- Profit sharing