TextNow is looking for a motivated Site Reliability Engineer to own infrastructure, monitoring, logging, ci/cd, reliability and everything in between. The role is about impact at scale, shaping how TextNow builds and operates its systems in an AI-first environment.
Requirements
- Ensure System Reliability: Design, build, and maintain scalable, resilient, and highly available systems to support TextNow’s infrastructure and services.
- Automation & Infrastructure as Code: Develop and maintain automation using Terraform, Ansible, and other tools to enable efficient deployment, scaling, and operations of cloud-based systems (AWS preferred).
- Incident Response & On-Call Support: Participate in an on-call rotation, troubleshoot issues, and drive incident resolution to minimize downtime and improve system performance.
- Performance Monitoring & Optimization: Implement and improve observability tools, logging, and monitoring solutions to identify and mitigate potential system issues proactively.
- Collaboration & Cross-Team Engagement: Work closely with software engineers, DevOps, and product teams to align technical efforts with business objectives and improve system reliability from development to production.
- Continuous Improvement: Identify areas for improvement in architecture, automation, and operational practices. Contribute to the design and implementation of new SRE best practices.
Benefits
- Free phone service
- Strong work life blend
- Flexible work arrangements (work-from-home, remote, or access to one of our office spaces)
- Employee stock options
- Unlimited vacation
- 12 paid holidays per year
- Competitive pay
- Health, dental, and vision benefits
- Short-term & long-term disability
- $750 annual wellness benefit or healthcare spending account
- RRSP matching (Canada) | 401(K) (USA)
- Parental leave for eligible employees
- Learning & Development opportunities