Operations Support Engineer plays a key role in ensuring the reliability, stability, and efficiency of mission-critical systems. This role involves monitoring system performance, resolving technical issues, and implementing preventive measures to support smooth operations, high uptime, and reliable services for users. Candidates with public sector experience are preferred.
Requirements
- Monitor and analyse product runtime environments (production and non-production) to ensure optimal system performance.
- Implement continuous improvement strategies to enhance system reliability and efficiency.
- Manage application and security incidents, performing problem determination and coordinating with internal teams and vendors for resolution.
- Develop and maintain operations and process guides to meet audit and compliance requirements.
- Lead and coordinate with operations teams and vendors to ensure 24/7 system support availability.
- Build self-healing systems with automated remediation for common alerts.
- Deploy full-stack monitoring with predictive analytics (CloudWatch Anomaly Detection, Stackdriver, Azure Monitor).
- Serve as the bridge between app teams and infra teams, enabling self-service for troubleshooting.
- Train agency teams on operational best practices and tool adoption (e.g., ITSM workflows, DevOps pipelines).
Benefits
- Competitive salary
- Opportunity to work with a leading technology company
- Collaborative and dynamic work environment
- Professional development and growth opportunities