Lead and manage end-to-end infrastructure for enterprise Gen AI applications hosted on OpenShift platforms, and design and operationalize Disaster Recovery infrastructure for Gen AI platforms.
Requirements
- Lead and manage end-to-end infrastructure for enterprise Gen AI applications hosted on OpenShift platforms.
- Design and operationalize Disaster Recovery (DR) infrastructure for Gen AI platforms, ensuring high availability and resilience.
- Manage certificate lifecycle (TLS/SSL), key management, and secrets handling across Gen AI applications and platforms.
- Implement and oversee vulnerability management, patching, and remediation across containers, Kubernetes, and underlying infrastructure.
- Support and coordinate penetration testing activities, addressing infrastructure-related findings and security gaps.
- Experience with AWS services and tools (Terraform, CloudFormation).
- Operate and support Control-M schedulers, logging, monitoring, and alerting tools for platform observability.
- Proven expertise managing OpenShift (OCP) environments in enterprise-scale production deployments.
- Hands-on experience with infrastructure setup and sizing, performance tuning, and capacity assessment for AI workloads.
- Experience supporting Oracle Database from an application infrastructure perspective.
- Practical knowledge of certificate management, secrets management, and key handling.
- Experience implementing CI/CD pipelines and infrastructure automation.
- Strong background in security, vulnerability management, and compliance controls.
- Proven experience designing and implementing DR infrastructure for mission-critical platforms.
- Experience working with AWS cloud services and hybrid cloud integrations.
- Strong coordination and leadership skills to work across Infrastructure, Network, Security, and Application teams.
- Experience with containerization and orchestration tools (Docker, Kubernetes).