Join Observe.AI as a Lead DevOps Engineer and drive high-impact initiatives like GPU orchestration, self-hosting, and low-latency AI deployments while working closely with ML teams to productionize cutting-edge models.
Requirements
- 6+ years of experience in DevOps, SRE, or Cloud Infrastructure roles, preferably in AI or data-intensive environments.
- Strong expertise in Kubernetes (EKS, AKS preferred ) for deploying AI workloads and managing GPU & non-CPU clusters.
- Experience with self-hosting services like Elasticsearch, Prometheus, Grafana, Kafka, etc.
- Hands-on expertise in Infrastructure as Code (Terraform, CloudFormation).
- Deep understanding of cloud platforms (AWS, Azure, GCP) and AI-focused services like AWS Sagemaker, Vertex AI, or Azure ML.
- Strong automation and scripting skills in Python, Bash, or Go.
- Experience in CI/CD tools (Jenkins, GitHub Actions, ArgoCD, etc.) with a focus on AI model deployment.
- Strong leadership and mentorship skills to guide DevOps and ML teams.
- FinOps expertise for optimizing GPU and AI cloud compute costs.
- Familiarity with service meshes (Istio, Linkerd) and API gateways.
- Knowledge of compliance frameworks (SOC2, ISO 27001, etc.) for AI data pipelines.
Benefits
- Excellent medical insurance options and free online doctor consultations
- Yearly privilege and sick leaves as per Karnataka S&E Act
- Generous holidays (National and Festive) recognition and parental leave policies
- Learning & Development fund to support your continuous learning journey and professional development
- Fun events to build culture across the organization
- Flexible benefit plans for tax exemptions (i.e. Meal card, PF, etc.)