Tekion is seeking a Staff Machine Learning Engineer to build and operate the production backbone that takes models from Applied Sciences and delivers reliable, low-latency ML services across Tekion's DMS, CRM, Digital Retail, Service, Payments, and enterprise products.
Requirements
- 8-10 years in ML engineering/MLOps or backend/platform engineering with production ML
- Experience with LLMs, retrieval systems, vector stores, and graph/knowledge stores
- Strong software engineering fundamentals: Python plus one of Java/Go/Scala; API design; concurrency; testing
- Hands-on with orchestration frameworks and libraries (LangChain, LlamaIndex, OpenAI Function Calling, AgentKit, etc.)
- Knowledge of agent architectures (reactive, planning, retrieval-augmented agents), and safe execution patterns
- Pipelines and data: Airflow/Kubeflow or similar; Spark/Flink; Kafka/Kinesis; strong data quality practices
- Microservices and runtime: Docker/Kubernetes, service meshes, REST/gRPC; performance and reliability engineering
- Model ops: experiment tracking, registries (e.g., MLflow), feature stores, A/B and shadow testing, drift detection
- Observability: OpenTelemetry/Prometheus/Grafana; debugging latency, tail behavior, and memory/CPU hotspots
- Cloud: AWS preferred (IAM, ECS/EKS, S3, RDS/DynamoDB, Step Functions/Lambda), with cost optimization experience
- Security/compliance: secrets management, RBAC/ABAC, PII handling, auditability
Benefits
- Paid time off
- Health insurance
- Dental insurance
- Vision insurance
- Retirement plan