Join our central DevOps Engineering Services organization at Swift, committed to reshaping the developer experience. As a Site Reliability Engineer, you'll be pivotal in crafting end-to-end delivery pipelines, ensuring seamless integration, deployment of infrastructure and software, and providing essential maintenance and support to our developer community.

Requirements

Contribute to deployment phases with a focus on scalability, reliability, and operability of ELK and Kafka solutions.
Develop automation scripts, infrastructure as code, and tooling using industry best practices to improve system reliability, reduce manual effort, and enable self-service.
Analyze production issues, identify root causes, and implement long-term reliability improvements through automation, alerting, monitoring, and architectural enhancements.
Work collaboratively with other team members and provide guidance to more junior team members.
Organize an efficient handover through high quality documentation and training.
Automate the deployment and operation of multi-tenant infrastructure, handling tasks that ensure system resilience and availability.
Develop and maintain monitoring tools, dashboards, and self-healing mechanisms.
Participate in on-call rotations, weekend deployment duty, conduct blameless postmortems, and drive continuous learning.
Work closely with developers, product teams, and engineering stakeholders to troubleshoot issues, improve systems, and integrate reliability improvements
Collaborate with technical teams on operational concerns of integration solutions on ELK platform.

Benefits

Competitive package
Freedom to be yourself
Diverse and inclusive environment
Opportunity to make a difference
Career development opportunities

Requirements

Contribute to deployment phases with a focus on scalability, reliability, and operability of ELK and Kafka solutions.

Develop automation scripts, infrastructure as code, and tooling using industry best practices to improve system reliability, reduce manual effort, and enable self-service.

Analyze production issues, identify root causes, and implement long-term reliability improvements through automation, alerting, monitoring, and architectural enhancements.

Work collaboratively with other team members and provide guidance to more junior team members.

Organize an efficient handover through high quality documentation and training.

Automate the deployment and operation of multi-tenant infrastructure, handling tasks that ensure system resilience and availability.

Develop and maintain monitoring tools, dashboards, and self-healing mechanisms.

Participate in on-call rotations, weekend deployment duty, conduct blameless postmortems, and drive continuous learning.

Work closely with developers, product teams, and engineering stakeholders to troubleshoot issues, improve systems, and integrate reliability improvements

Collaborate with technical teams on operational concerns of integration solutions on ELK platform.

Senior Site Reliability Engineer - ELK

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Senior Site Reliability Engineer - ELK

Lead Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer - ELK

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Senior Site Reliability Engineer - ELK

Lead Site Reliability Engineer

Senior Site Reliability Engineer

Job Details

About Swift