Lead Software Engineer (SRE, OMS) responsible for ensuring high availability and reliability of mission-critical systems, providing technical guidance, and developing observability solutions.
Requirements
- Ensure high availability and reliability of mission-critical systems
- Provide technical guidance to the team
- Develop and implement observability solutions
- Optimize system performance
- Work with engineering teams to improve system architecture
- Enhance system alerting and proactive monitoring
- Develop automation scripts and tools
- Build self-healing and auto-remediation workflows
- Lead incident response efforts
- Establish and enhance post-mortem and preventive action processes
- Create custom dashboards and analytics solutions
- Implement intelligent alerting mechanisms
- Define and track critical business and system SLIs, SLOs, and error budgets
- Mentor and guide team members
- Improve collaboration and communication across engineering, support, and product teams