The Software Engineer / Site Reliability Engineer (SRE) will play a critical role in driving reliability, scalability, and performance for the Banking Solutions, Payments, and Capital Markets platforms. This role blends core SRE principles, performance engineering, and service health management to support large-scale, mission-critical systems.
Requirements
- Strong experience in Core SRE practices, including reliability engineering, incident management, and automation.
- Proven hands-on experience in Performance Engineering / Performance Testing for large-scale distributed systems.
- Deep understanding and implementation experience with SLI / SLO / Error Budget frameworks.
- Proficiency in cloud platforms (AWS, Azure, or Google Cloud).
- Hands-on experience with containerization and orchestration (Docker, Kubernetes).
- Strong background in monitoring, observability, and logging.
- Tools such as Prometheus, Grafana, Datadog, Splunk, ELK Stack.
- Experience with CI/CD pipelines (Jenkins, GitLab CI/CD, Azure DevOps).
- Proficiency in scripting and automation using Python, Bash, Terraform, Ansible.
- Strong troubleshooting skills across application, infrastructure, and network layers.
- Experience designing and running incident response and post-mortem reviews.
- Ownership mindset with accountability for service reliability and customer outcomes.
- Excellent communication, collaboration, and stakeholder management skills.