TechInsights is building the reliability and AI operations foundation for its next chapter — an AI-first intelligence platform that runs the most demanding semiconductor intelligence workflows in the world. We're looking for a Senior Site Reliability Engineer who wants to own that foundation.
Requirements
- Bachelor's degree in Computer Science, Engineering, or equivalent combination of education and experience
- 6–8 years of progressive experience in site reliability engineering, platform engineering, or DevOps, with demonstrated technical leadership at the senior individual contributor level
- Deep expertise in AWS (EKS, Lambda, CloudWatch, AWS Config) and multi-region architecture patterns
- Proficiency with Terraform and GitOps; experience with policy-as-code (Sentinel, OPA/Rego, or equivalent)
- Hands-on Datadog experience at operational depth: dashboards, SLO tracking, alerting, log management, distributed tracing
- Strong containerization expertise: Docker, Kubernetes (EKS preferred)
- Proficiency in Python and/or Bash; experience building operational tooling; solid understanding of Java and Spring Boot microservice architecture sufficient to make reliability and deployment decisions for EKS-hosted services
- Deep expertise in CI/CD pipeline design and optimization using Bitbucket Pipelines and GitHub Actions
- Familiarity with IDP tooling (Backstage, Atlassian Compass, or equivalent) strongly preferred
- Experience with AI/ML workload infrastructure, LLM API integration, or agentic system operations considered a strong asset
Benefits
- Company-sponsored training and development opportunities
- Comprehensive benefits package (health, dental, vision, wellness, RRSP Matching, annual fitness reimbursement)
- Flexible vacation policy
- Community involvement opportunities through charitable alliances: https://www.techinsights.com/community-involvement
- Wellness resources and support
- Inclusive environment that prioritizes diversity, equity, and accessibility