Tecsys is seeking a Site Reliability Engineer to join their Network and Security Operations Center (NOC) team. The successful candidate will help maintain, optimize, and ensure the reliability and performance of the systems that power their cloud infrastructure across AWS and Kubernetes.
Requirements
- 5+ years in Site Reliability, Cloud, or DevOps Engineering
- Experience designing and deploying large scale systems, multi-vendor platforms and globally distributed infrastructure
- Proven experience managing cloud infrastructure in AWS (multi-account, VPC, EC2, EKS) and Kubernetes at scale
- Strong hands-on experience with IaC and automation (Terraform, Ansible, or similar)
- Familiarity with CI/CD pipelines and release automation (GitLab preferred, Jenkins acceptable)
- Deep understanding of monitoring and observability using Datadog (or equivalent), including metric design, log pipelines, alerting, and dashboards
- Experience with incident management, on-call participation, escalation, and structured postmortems
- Scripting skills in Python, Bash, Java or equivalent for automation and diagnostics
- Basic knowledge of Java- or.Net-based development required
- Strong English communication skills, both written and spoken
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Visa Sponsorship
- Four Day Work Week
- Generous Parental Leave
- Tuition Reimbursement
- Relocation Assistance