
Job description
We are seeking a hands-on AI/ML Engineer specializing in MLOps and Site Reliability Engineering (SRE) to build, operate, and continuously improve production-grade machine learning systems. In this role, you will partner with data scientists, data engineers, and software teams to standardize the ML lifecycle, improve reliability and performance, and enable rapid, safe delivery of models and AI services at scale.
Design and implement reusable MLOps platform capabilities for training, deployment, and monitoring of ML/LLM systems. Deploy models and AI services using containers and orchestration (e.g., Kubernetes) with robust rollout strategies (blue/green, canary, A/B). Implement end-to-end observability: structured logging, metrics, tracing, dashboards, and alerting for both infrastructure and model behavior.
You will have the opportunity to work with a global leader in diversified electronics for the semiconductor manufacturing ecosystem. You will be part of a team that thrives on tackling really hard problems and has a strong focus on innovation and R&D.
Keep exploring
Sign in to see similar jobs
Create a free account to discover roles related to this posting.
Company

Manufacturing • Tech, Software & IT Services
KLA is a global leader in semiconductor process control, delivering advanced equipment and services that drive innovation across the electronics industry. The company specializes in process-enabling solutions for wafer and reticle manufacturing, integrated circuits, packaging, and printed circuit boards, leveraging cutting-edge metrology, inspection, and AI-powered analytics. KLA’s multidisciplinary teams—physicists, engineers, data scientists, and problem-solvers—partner with leading customers worldwide to design solutions that push the boundaries of performance and yield. Its distinctive blend of industry-leading technology, deep scientific expertise, and collaborative innovation positions KLA at the forefront of semiconductor and emerging nanotechnology markets.