We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our production systems through automation, observability, and operational excellence. As an SRE, you will work closely with our product development team to integrate observability, reliability, and security considerations into the entire software development lifecycle.
Requirements
- Design, implement and maintain scalable and reliable infrastructure.
- Collaborate with engineering and product teams to integrate observability, reliability, and security considerations into the entire software development lifecycle.
- Develop and implement automation tools for monitoring, deployment, and incident response to ensure efficient and reliable operations.
- Lead and participate in post-incident reviews to learn from operational surprises and driving actionable improvements to system reliability.
- Proactively identify and resolve performance bottlenecks and system issues.
- Conduct regular security assessments and audits to mitigate risks.
- Champion and embed a culture of reliability across the organization.
- Implement and manage Infrastructure as Code (IaC) using Ansible and other industry-standard tools.
- Implement and enforce cloud security best practices, including identity and access management (IAM), encryption, and network security.
- Develop dashboards and alerts to ensure real-time visibility into system operations.
- Stay updated with emerging cloud technologies and recommend improvements to existing systems.
Benefits
- Flexible working hours
- Free snacks and beverages
- Regular team events
- Modern office environment
- Mental health counselling
- Home Office set up budget
- 25 vacation days
- Additional day off for birthday