We're looking for a MLOps / Data Engineer to join our team, designing and automating CI/CD pipelines, optimizing large-scale data processing with Apache Spark, and leveraging Databricks to deliver machine learning solutions. As a bridge between data science and production systems, you'll ensure that models thrive in real-world environments.
Requirements
- Design, implement, and maintain CI/CD pipelines for machine learning workflows using tools like GitHub Actions, Azure DevOps, or Jenkins.
- Build and optimize data processing pipelines in Apache Spark (PySpark and Scala) for large-scale, distributed listener datasets.
- Deploy and manage Databricks environments, ensuring efficient cluster usage, job scheduling, and cost optimization.
- Collaborate with data scientists to productionize ML models, integrating them into scalable APIs or batch processing systems that feed real-time, machine-readable audience signals.
- Implement automated testing, monitoring, and alerting for ML pipelines to ensure the reliability and reproducibility that certified buyers require.
- Champion best practices in version control, model registry management, and environment reproducibility.
- Help evolve our listener data infrastructure toward agent-compatible supply — live, structured, queryable data feeds that autonomous buying systems can discover and act on without human mediation.
Benefits
- Fully remote position
- 4 weeks of vacation + 5 paid personal days annually
- Group insurance programs as of your first day, including access to telemedicine and an EAP
- Collective RRSP with matching contribution
- Internet reimbursement