Join Oracle's Health Data Intelligence (HDI) team as a Principal Infrastructure & Reliability Engineer, designing and operating highly reliable, scalable infrastructure and data pipelines for large-scale healthcare analytics platforms. Work on advancing automation, observability, and AI-assisted reliability practices, including exploring Generative AI and intelligent automation for incident response, system resilience, and operational efficiency.
Requirements
- Experience building and operating high-availability, fault-tolerant systems
- Strong understanding of distributed systems, performance monitoring, and resiliency patterns
- Experience with incident response, root-cause analysis, and production troubleshooting
- Hands-on experience applying Generative AI or Agentic AI to infrastructure lifecycle management, observability, and incident response
- Strong experience with multi-cloud environments (OCI, AWS/Azure)
- Deep understanding of cloud infrastructure design, deployment, and resource optimization
- Advanced competency in CI/CD pipelines (Jenkins, Kubernetes)
- Infrastructure as Code (Terraform)
- Observability tools (Prometheus, Grafana)
- Proficiency in Data Warehousing platforms (e.g., Vertica, Snowflake)
- Experience with ETL frameworks and large-scale data processing
- Understanding of columnar storage systems
- BI & Reporting experience supporting or integrating BI tools (Tableau, Power BI, Oracle Analytics)
- Strong proficiency in Python, Java, or Go
- Experience with Docker, Kubernetes, and shell scripting
Benefits
- Flexible medical, life insurance, and retirement options
- Volunteer programs
- Diversity and inclusion initiatives