Core & ML Ops Team Lead - Remote

Zyte

Budapest, Hungary

Posted November 6, 2025

Full Time

About the Company

Core & ML Ops Team Lead - Remote at Zyte | HireBase

Job Description

Zyte is seeking an experienced Team Lead to manage our Core & MLOps Squad, responsible for building the bedrock infrastructure that powers Zyte at scale. This hands-on technical leadership role requires expertise across MLOps, systems programming, and orchestration to lead a cross-functional team in designing and maintaining the scalable foundation that enables all Zyte teams to build and run their services with confidence.

Requirements

Design and evolve the core platform (Kubernetes, Mesos, GPU scheduling/autoscaling, distributed compute).
Own the model platform: registry, experiment tracking, training orchestration, evaluation, serving, and monitoring.
Build the Golden Path: reference repos, a scaffold CLI, opinionated CI/CD pipelines, runtime contracts (health/metrics/tracing/SLOs), high-performance clients, circuit breakers and other production‐ready defaults.
Operate a secure, multi‐tenant model registry and training platform with standardized experiment/evaluation harnesses.
Provide turnkey serving patterns (online + batch), drift/quality monitoring, and rollback playbooks.
Integrate public/open‐source AI capabilities as managed platform services with cost and data‐governance guardrails.
Run the squad: roadmap/prioritization, delivery, mentoring, and high engineering standards.
Partner with product engineering (Zyte API, Scrapy Cloud), Prod Ops, and Security on adoption and rollout plans.
Mentor the team and foster a platform-thinking mindset.
Ownership Areas: Container orchestration (Kubernetes/Knative), GPU provisioning & autoscaling, environment & secret management.
Operators, sidecars, and internal SDKs/libraries (Go/Rust/Python/Java) that enforce the golden path contract.
Model platform: registry, experiment tracking, training orchestration, evaluation framework, serving infra, model monitoring.
Observability: logging/metrics/tracing pipelines;
Billing pipeline: metering/events/cost tracking abstractions.
Golden Path: Java, Python, ML templates + CI/CD blueprints + docs + scaffold CLI.
Reliability enablement (SRE practices), cost governance, supply‐chain security (SBOM, image signing).
5+ years experience building distributed systems; 3+ years in MLOps/ML platform engineering (or equivalent impact).
Knowledge of Linux/OS internals (process model, cgroups/namespaces), networking (TCP/IP, HTTP/2), concurrency, and performance profiling.
Deep understanding of Kubernetes (bonus: Mesos)
Proficiency developing high-performance services in Java, Rust, Go or C++ (bonus: familiarity with vert.x and Netty frameworks); strong Python skills.
Experience with GPU infrastructure (scheduling, containerization, optimization).
Track record of designing and operating model platforms (registry, training, serving, monitoring) in production.
Demonstrated success leading technical teams and implementing organization-wide platform solutions.

Benefits

We love fostering and nourishing new ideas and bringing them to market
Become part of a self-motivated, progressive, multi-cultural team.
Have the freedom and flexibility to work from where you do your best work, as we are a completely remote company.
Get the chance to work with cutting-edge open-source technologies and tools.

Core & ML Ops Team Lead - Remote

About the Company

Core & ML Ops Team Lead - Remote

About the Company

Job Description

Requirements

Benefits

Job Details

About Zyte

Similar Jobs

Core & ML Ops Team Lead - Remote

Core & ML Ops Team Lead - Remote

Core & ML Ops Team Lead - Remote