The Senior Data Architect will own the Training Environment data architecture, defining dataset design and schema, data selection and sampling strategy, and data catalog and dataset discovery infrastructure. They will work closely with ML engineers to understand training data requirements and translate them into concrete dataset specifications and pipeline configurations.
Requirements
- 5+ years in data architecture, data engineering, or LLM/ML data infrastructure, with demonstrated ownership of production data systems serving ML/AI model development
- Strong understanding of ML training data requirements — what makes training data high-quality, diverse, and useful for LLM and NLU model development, not just clean and well-structured
- Deep experience with data modeling, schema design, and data pipeline architecture
- Strong proficiency with Snowflake, AWS S3, and ETL/ELT orchestration tools (Airflow, dbt, or similar)
- Experience defining annotation requirements and managing data annotation workflows — intent labeling, entity tagging, dialog classification, or similar NLP annotation tasks
- Experience with data cataloging, metadata management, and dataset discovery at scale
- Strong SQL and Python skills for data pipeline development and data quality analysis
- Experience with data quality frameworks: deduplication, sampling strategies, diversity optimization
Benefits
- Fixed compensation
- Long-term employment with the working days vacation
- Development in professional growth (courses, training, etc)
- Being part of successful cutting-edge technology products that are making a global impact in the service industry
- Proficient and fun-to-work-with colleagues
- Apple gear