We are seeking an experienced Site Reliability Engineer / Platform Engineer to join our team and help build and maintain a resilient, scalable infrastructure supporting our applications across multiple cloud providers.

Requirements

Design, build, and maintain infrastructure across AWS, GCP, and Azure using Infrastructure as Code (IaC) principles
Implement and optimize CI/CD pipelines using tools like Argo and CircleCI to enable rapid, reliable deployments
Manage and scale Kubernetes clusters in production environments, ensuring high availability and optimal resource utilization
Administer and optimize cloud databases including MongoDB, Redis, RDS, and other data stores for performance and reliability
Develop monitoring, alerting, and observability solutions to identify and resolve issues before they impact users
Automate routine operational tasks to reduce manual toil and improve system reliability
Conduct incident response and post-mortem analysis to drive continuous improvement
Collaborate with development teams to design systems with reliability, scalability, and operational excellence in mind
Document infrastructure architecture, runbooks, and operational procedures
Evaluate and implement new tools and technologies to improve platform capabilities

Benefits

Competitive salary
Comprehensive benefits package
Opportunity to work with cutting-edge cloud technologies and tools
Collaborative environment focused on knowledge sharing and professional growth
Remote or flexible work arrangement
Continuous learning and development opportunities

Requirements

Design, build, and maintain infrastructure across AWS, GCP, and Azure using Infrastructure as Code (IaC) principles
Implement and optimize CI/CD pipelines using tools like Argo and CircleCI to enable rapid, reliable deployments
Manage and scale Kubernetes clusters in production environments, ensuring high availability and optimal resource utilization
Administer and optimize cloud databases including MongoDB, Redis, RDS, and other data stores for performance and reliability
Develop monitoring, alerting, and observability solutions to identify and resolve issues before they impact users
Automate routine operational tasks to reduce manual toil and improve system reliability
Conduct incident response and post-mortem analysis to drive continuous improvement
Collaborate with development teams to design systems with reliability, scalability, and operational excellence in mind
Document infrastructure architecture, runbooks, and operational procedures
Evaluate and implement new tools and technologies to improve platform capabilities

Benefits

Competitive salary
Comprehensive benefits package
Opportunity to work with cutting-edge cloud technologies and tools
Collaborative environment focused on knowledge sharing and professional growth
Remote or flexible work arrangement
Continuous learning and development opportunities

Site Reliability Engineer / Platform Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Site Reliability Engineer / Platform Engineer

Site Reliability Engineer / Platform Engineer

Member of Technical Staff

Site Reliability Engineer / Platform Engineer

About the Company

Job Description

Requirements

Benefits

Similar Jobs

Site Reliability Engineer / Platform Engineer

Site Reliability Engineer / Platform Engineer

Member of Technical Staff

Job Details

About DevRev