Join our central DevOps Engineering Services organization at Swift, committed to reshaping the developer experience. As a Site Reliability Engineer, you'll be pivotal in crafting end-to-end delivery pipelines, ensuring seamless integration, deployment of infrastructure and software, and providing essential maintenance and support to our developer community.
Requirements
- Contribute to deployment phases with a focus on scalability, reliability, and operability of ELK and Kafka solutions.
- Develop automation scripts, infrastructure as code, and tooling using industry best practices to improve system reliability, reduce manual effort, and enable self-service.
- Analyze production issues, identify root causes, and implement long-term reliability improvements through automation, alerting, monitoring, and architectural enhancements.
- Work collaboratively with other team members and provide guidance to more junior team members.
- Organize an efficient handover through high quality documentation and training.
- Automate the deployment and operation of multi-tenant infrastructure, handling tasks that ensure system resilience and availability.
- Develop and maintain monitoring tools, dashboards, and self-healing mechanisms.
- Participate in on-call rotations, weekend deployment duty, conduct blameless postmortems, and drive continuous learning.
- Work closely with developers, product teams, and engineering stakeholders to troubleshoot issues, improve systems, and integrate reliability improvements
- Collaborate with technical teams on operational concerns of integration solutions on ELK platform.
Benefits
- Competitive package
- Freedom to be yourself
- Diverse and inclusive environment
- Opportunity to make a difference
- Career development opportunities