We are seeking a Senior Site Reliability Engineer (Senior SRE) to drive the scalability, reliability, and efficiency of our critical systems and infrastructure. As a senior member of the team, you will lead SRE initiatives, mentor engineers, and architect solutions that enhance system resilience and operational excellence.

Requirements

Bachelor's or Master's degree in Computer Science or a related field
6+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
Strong experience with cloud platforms (AWS, GCP, or Azure) and cloud-native technologies
Expertise in Kubernetes and container orchestration
Expertise with log management tools like ELK or Graylog
Strong coding/scripting skills in Python, Go, or Bash for automation
Deep understanding of networking, DNS, CDN, load balancing, and security
Proven experience with observability tools (Prometheus, Grafana, ELK, OpenTelemetry)
Hands-on experience in performance tuning, high availability, and DR strategies
Strong knowledge of incident management frameworks and reliability metrics (SLOs, SLIs, SLAs)
Experience leading cross-functional reliability initiatives

Benefits

Implement security best practices, including infrastructure hardening, zero-trust principles, and identity management
Ensure compliance with SOC2 and ISO 27001
Mentor and coach junior SREs, fostering a culture of reliability
Work closely with development teams to ensure reliability is built into the software lifecycle
Advocate for chaos engineering, game days, and resilience testing to enhance system robustness

Requirements

Bachelor's or Master's degree in Computer Science or a related field
6+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering
Strong experience with cloud platforms (AWS, GCP, or Azure) and cloud-native technologies
Expertise in Kubernetes and container orchestration
Expertise with log management tools like ELK or Graylog
Strong coding/scripting skills in Python, Go, or Bash for automation
Deep understanding of networking, DNS, CDN, load balancing, and security
Proven experience with observability tools (Prometheus, Grafana, ELK, OpenTelemetry)
Hands-on experience in performance tuning, high availability, and DR strategies
Strong knowledge of incident management frameworks and reliability metrics (SLOs, SLIs, SLAs)
Experience leading cross-functional reliability initiatives

Benefits

Implement security best practices, including infrastructure hardening, zero-trust principles, and identity management
Ensure compliance with SOC2 and ISO 27001
Mentor and coach junior SREs, fostering a culture of reliability
Work closely with development teams to ensure reliability is built into the software lifecycle
Advocate for chaos engineering, game days, and resilience testing to enhance system robustness

Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Site Reliability Engineer

Devops - Senior Engineer

Senior Site Reliability Engineer

Site Reliability Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Site Reliability Engineer

Devops - Senior Engineer

Senior Site Reliability Engineer

Job Details

About Josys