Join an intellectually stimulating work environment and be a pioneer in AI benchmarking and dataset engineering. Collaborate with our R&D team to build evaluation infrastructure for post-transformer models.

Requirements

Proactively identify, prioritize, and curate relevant public and client-driven benchmarks
Evaluate candidate benchmarks for clarity, data quality, evaluation methodology, and fit with our model roadmap
Run benchmarks with baseline models to validate setup, uncover edge cases, and de-risk R&D runs
Hand off "benchmark-ready" packages to R&D (specs, data, evaluation scripts, expected metrics, constraints)
Maintain a shared vocabulary and documentation around benchmarks, datasets, and evaluation formats
Track and organize benchmark results, model leaderboards, and "what good looks like" for different customers and scenarios
Contribute to demos and public-facing proof points based on benchmark outcomes

Benefits

Full-time, permanent contract
Remote work
Possibility to work or meet with other team members in one of our offices: Palo Alto, CA; Paris, France or Wroclaw, Poland
Flexible compensation based on profile and location

Requirements

Proactively identify, prioritize, and curate relevant public and client-driven benchmarks
Evaluate candidate benchmarks for clarity, data quality, evaluation methodology, and fit with our model roadmap
Run benchmarks with baseline models to validate setup, uncover edge cases, and de-risk R&D runs
Hand off "benchmark-ready" packages to R&D (specs, data, evaluation scripts, expected metrics, constraints)
Maintain a shared vocabulary and documentation around benchmarks, datasets, and evaluation formats
Track and organize benchmark results, model leaderboards, and "what good looks like" for different customers and scenarios
Contribute to demos and public-facing proof points based on benchmark outcomes

Benefits

Full-time, permanent contract
Remote work
Possibility to work or meet with other team members in one of our offices: Palo Alto, CA; Paris, France or Wroclaw, Poland
Flexible compensation based on profile and location

AI Benchmark & Datasets Engineer / Researcher

About the Company

Job Description

Requirements

Benefits

Similar Jobs

AI Benchmark & Datasets Engineer / Researcher

AI Benchmark & Datasets Engineer / Researcher

AI Benchmark & Datasets Engineer/ Researcher Internship

AI Benchmark & Datasets Engineer / Researcher

About the Company

Job Description

Requirements

Benefits

Similar Jobs

AI Benchmark & Datasets Engineer / Researcher

AI Benchmark & Datasets Engineer / Researcher

AI Benchmark & Datasets Engineer/ Researcher Internship

Job Details

About Pathway