Amgen is hiring an Associate Machine Learning Engineer to work on next-generation capabilities and services in Applied AI & Automation using innovative COTS products, open-source software, frameworks, tools, and cloud computing services. The role emphasizes building and scaling AI and machines learning solutions from development to production, with a focus on operational excellence, observability, and compliance for applied AI and GenAI systems.
Requirements
- Collaborate with data scientists to develop, train, and evaluate machine learning models.
- Build and maintain MLOps pipelines, including data ingestion, feature engineering, model training, deployment, and monitoring.
- Leverage cloud platforms (AWS, Databricks) for ML model development, training, and deployment.
- Develop solutions using DevSecOps framework that are secure, scalable, reliable, and aligned with enterprise architecture standards.
- Evaluate model performance using appropriate metrics and optimize models for accuracy and efficiency
- Develop and execute unit tests, integration tests, and other testing strategies to ensure the quality of the software
- Create and maintain documentation on software architecture, design, deployment, disaster recovery, and operations
- Identify and resolve technical challenges effectively
- Provide ongoing support and maintenance for applications, ensuring that they operate smoothly and efficiently
- Analyze customer feedback and support data to identify pain points and opportunities for improvement
- Evaluate and recommend technologies and tools that best fit the solution requirements
- Support operationalization of machine learning and GenAI models developed by data scientists and solution teams.
- Assist in evaluating model and LLM performance using metrics related to reliability, efficiency, and response quality.
- Support deployment and operation of LLM-based workflows, including prompt configurations, retrieval-augmented generation (RAG) pipelines, and agent-based automations.
- Assist with monitoring AI and LLM systems for availability, latency, error rates, and quality degradation.
- Support model, prompt, and pipeline versioning across development, test, and production environments.
- Participate in incident triaging, root cause analysis, and rollback or mitigation activities for AI services.
- Assist with evaluation runs for LLM outputs, including grounding, reliability, and safety checks.
- Follow established AI governance, security, and compliance standards when operating AI and GenAI solutions.
- Monitor AI and LLM endpoints for availability, latency, throughput, and error rates using enterprise monitoring tools.
- Assist with dashboards, alerts, runbooks, and operational documentation to support reliable AI system operations.
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Visa Sponsorship
- Four Day Work Week
- Generous Parental Leave
- Tuition Reimbursement
- Relocation Assistance