Innodata is building a team of Language Data Scientists and Gen AI experts to help our customers advance GenAI applications. As a Senior Language Data Scientist, you will lead projects and own processes for creating, validating and annotating data for use in LLM/ML applications.
Requirements
- MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred
- Collaborating with cross-functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals
- Developing clear and concise documentation, including technical specifications, user guides, and presentations, to communicate complex AI concepts to both technical and nontechnical stakeholders
- Familiarity with GenAI technologies that enables you to improve existing processes to handle future challenges
- Language and language data expertise: Extensive experience working with human language data and designing human evaluation tasks, including multi-phase and complex workflows
- Deep understanding of language and its relationship with culture
- Ability to identify ambiguity and subjectivity in language
- Ability to work with multi-lingual and multi-modal projects
- Quantitative Analysis Skills: Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling
- Technical skills:
- Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face
- Proficiency in Python to handle / transform large datasets (e.g. pre- and postprocessing data, pandas) perform quantitative analyses visualize data (for example matplotlib, seaborn)
- Data processing: Deep understanding of data pipelines to support ML and NLP workflows, Knowledge of efficient data collection, transformation, and storage
- Knowledge of data structures, algorithms, and data engineering principles
- Excellent interpersonal skills for effective cross-functional stakeholder engagement
- Excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI solutions
- Ability to work independently and collaborate as part of a team
- Adaptable to changing technologies and methodologies
- Ability to translate experience, research and development information to understand client products and services
- Providing technical mentorship and guidance to junior team members