Senior Site Reliability Engineer responsible for owning the reliability, scalability, and operational excellence of critical enterprise platforms and shared capabilities.

Requirements

Design, implement, and operate reliable shared services platforms aligned to TR Technology standards, acting as the key point of escalation for any production‐related incidents.
Participate in on-call/shift rotations (L2).
Own service reliability outcomes, including availability, performance, latency, and capacity.
Implement site reliability engineering and DevOps best practices.
Feed non-functional requirements into the product backlog, such as, but not limited to, high availability, scalability, self-healing, observability, security)
Apply advanced monitoring, alert correlation, and root cause analysis techniques (build and maintain monitoring & alerting for all aspects of infrastructure, micro-services and the platform)
Act as Incident Commander for high-severity incidents when required: Troubleshoot and monitor until successful mitigation, communicate effectively, postmortem and implementation of the learnings.
Apply AI/ML‐driven reliability and operational practices, including experience with AI‐powered monitoring, anomaly detection, incident triage, and predictive system analysis.
Collaborate with engineering and platform teams to integrate AI‐based automation into CI/CD, infrastructure management, and incident response workflows.
Focus on Continuous improvement and technical standards – drive improvements in productivity, monitoring, tooling and set industry best practices.

Benefits

Flexible vacation
Two company-wide Mental Health Days off
Access to the Headspace app
Retirement savings
Tuition reimbursement
Employee incentive programs
Resources for mental, physical, and financial wellbeing

Requirements

Design, implement, and operate reliable shared services platforms aligned to TR Technology standards, acting as the key point of escalation for any production‐related incidents.

Participate in on-call/shift rotations (L2).

Own service reliability outcomes, including availability, performance, latency, and capacity.

Implement site reliability engineering and DevOps best practices.

Feed non-functional requirements into the product backlog, such as, but not limited to, high availability, scalability, self-healing, observability, security)

Apply advanced monitoring, alert correlation, and root cause analysis techniques (build and maintain monitoring & alerting for all aspects of infrastructure, micro-services and the platform)

Act as Incident Commander for high-severity incidents when required: Troubleshoot and monitor until successful mitigation, communicate effectively, postmortem and implementation of the learnings.

Apply AI/ML‐driven reliability and operational practices, including experience with AI‐powered monitoring, anomaly detection, incident triage, and predictive system analysis.

Collaborate with engineering and platform teams to integrate AI‐based automation into CI/CD, infrastructure management, and incident response workflows.

Focus on Continuous improvement and technical standards – drive improvements in productivity, monitoring, tooling and set industry best practices.

Senior Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Senior Site Reliability Engineer

Senior Site Reliability Engineer (SRE)

Senior Site Reliability Engineer Team Lead - OP02087

Senior Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Senior Site Reliability Engineer

Senior Site Reliability Engineer (SRE)

Senior Site Reliability Engineer Team Lead - OP02087

Job Details

About Thomson Reuters