Bright Vision Technologies is a software development company seeking an AI Data Infrastructure Engineer to build and operate large-scale data systems for AI training and evaluation pipelines.
Requirements
- Design and operate large-scale data pipelines supporting AI training, evaluation, and continual improvement workflows.
- Build ingestion systems for diverse modalities including text, image, audio, video, and structured signals.
- Implement data cleaning, deduplication, filtering, and quality assurance at petabyte scale.
- Develop dataset versioning, lineage, and provenance tracking systems suitable for reproducible training.
- Build high-throughput data loading systems that maximize GPU utilization during training.
- Implement labeling workflows, active learning pipelines, and human-in-the-loop data improvement systems.
- Design storage architectures balancing cost, throughput, and latency across data tiers.
- Build evaluation dataset construction pipelines with strict integrity and contamination controls.
- Implement data privacy, redaction, and consent enforcement throughout the pipeline.
- Collaborate with ML researchers and engineers to align data systems with model development needs.
- Drive observability of data quality, drift, and pipeline health across the AI data estate.
- Optimize cost and performance through compression, format selection, and caching strategies.
- Document data systems, schemas, and operational procedures for broad internal use.
- Stay current with AI data infrastructure research and emerging open-source tools.
Benefits
- Competitive base salary commensurate with experience, plus benefits