Design and scale robust evaluation systems to measure AI performance and reliability, lead efforts to build human-in-the-loop and automated annotation pipelines, define and implement continuous evaluation workflows, and analyze model outputs for correctness, bias, safety, and reliability.
Requirements
- 6+ years of experience building large-scale distributed systems or machine learning systems in production environments
- Designed infrastructure to support AI/ML model evaluation, annotation, or benchmarking workflows
- Strong understanding of AI/ML concepts, including evaluation metrics, prompt analysis, and trust & safety challenges
- Ability to use AI coding tools in day-to-day workflows and validate, critique, and refine AI-generated output
Benefits
- New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
- Continuous professional development, product training, and career pathing
- Intradepartmental mentor and buddy program for in-house networking
- An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
- Access to Inclusion Talks, our internal panel discussions
- Free, global mental health benefits for employees and dependents age 6+
- Competitive global benefits