Immediate joiners or candidates who can join within 10 days only to apply. We seek an experienced Site Reliability Engineer (SRE) to join our team, who will scale our operations, design and maintain resilient infrastructure and apply best practices for reliability and efficiency within our cloud-native environment.

Requirements

Manage and maintain Kubernetes clusters across cloud platforms, including OpenShift, Amazon EKS, Azure AKS, and Google GKE.
Implement and manage CI/CD pipelines using tools such as Jenkins, GitHub Actions, Argo CD, or GitLab CI/CD.
Design and maintain observability stacks with tools including Prometheus, Grafana, Loki, OpenTelemetry, and related technologies.
Optimize system performance and resolve production issues.
Implement SRE principles, including Service Level Indicators (SLIs) and Service Level Objectives (SLOs), to uphold system reliability.
Automate infrastructure and operational tasks using programming languages such as Go or Python, and Infrastructure as Code (IaC) tools like Terraform.
Apply AI skills like Vibe Coding for engineering tasks, AIOps and automation, understanding of Large Language Models (LLMs) and AI Agents, and proficiency in Prompt Engineering.
Remain current with emerging technologies, including AI, MLOps, and Edge Computing.
Contribute to knowledge sharing through technical writing and presentations.

Benefits

Competitive salary
Premium health insurance and various health & wellness benefits
Opportunity to work on cutting-edge technologies
Collaborative and supportive work environment
Chance to make a real impact on the company's success

Requirements

Manage and maintain Kubernetes clusters across cloud platforms, including OpenShift, Amazon EKS, Azure AKS, and Google GKE.

Implement and manage CI/CD pipelines using tools such as Jenkins, GitHub Actions, Argo CD, or GitLab CI/CD.

Design and maintain observability stacks with tools including Prometheus, Grafana, Loki, OpenTelemetry, and related technologies.

Optimize system performance and resolve production issues.

Implement SRE principles, including Service Level Indicators (SLIs) and Service Level Objectives (SLOs), to uphold system reliability.

Automate infrastructure and operational tasks using programming languages such as Go or Python, and Infrastructure as Code (IaC) tools like Terraform.

Apply AI skills like Vibe Coding for engineering tasks, AIOps and automation, understanding of Large Language Models (LLMs) and AI Agents, and proficiency in Prompt Engineering.

Remain current with emerging technologies, including AI, MLOps, and Edge Computing.

Contribute to knowledge sharing through technical writing and presentations.

Site Reliability Engineer(SRE)

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Site Reliability Engineer(SRE)

Site Reliability Engineering (SRE)

Site Reliability Engineer (SRE)

Site Reliability Engineer(SRE)

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Site Reliability Engineer(SRE)

Site Reliability Engineering (SRE)

Site Reliability Engineer (SRE)

Job Details

About CloudRaft