We're looking for an ambitious and technically outstanding Senior Software Engineer, Machine Learning to join our Enterprise Solution Engineering Team, AIXON. You'll design, build, and optimize scalable, high-performance ML infrastructure to power transformative AI solutions.
Requirements
- Architect and operate resilient ML job execution frameworks covering training, inference, and post-processing workflows.
- Develop and maintain API services and developer tooling to orchestrate ML workflows on Kubernetes using Argo Workflows, Helm, Terraform.
- Build scalable, efficient batch pipelines with Apache Spark to support large-scale ML training and evaluation.
- Design and maintain robust data infrastructures using Trino, Databricks and other modern database technologies, monitored with Prometheus and Grafana for high availability and observability.
- Develop tooling that streamlines ML experimentation, accelerates production workflows, and empowers cross-functional teams to innovate rapidly.
- Collaborate deeply with ML scientists to transform research prototypes into reliable, scalable, user-facing AI products.
- Lead cloud infrastructure design and operations on GCP, leveraging managed services such as Google Compute Engine (GCE), Google Kubernetes Engine (GKE), Cloud Storage, Cloud Functions, Cloud Pub/Sub, Cloud SQL, BigQuery, and more.
- Define and implement CI/CD pipelines with tools like Jenkins, Github Action, or ArgoCD to enable seamless, automated deployments.
- Harness distributed computing and parallel programming principles to optimize system resource utilization and performance.