Bright Vision Technologies is a forward-thinking software development company looking for a skilled Site Reliability Engineer (SRE) to join their dynamic team and contribute to their mission of transforming business processes through technology.
Requirements
- Define, instrument, and continually refine service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for critical services.
- Lead incident response and resolution for production issues.
- Design and implement comprehensive monitoring, logging, and tracing strategies.
- Build and maintain robust on-call processes, runbooks, and escalation paths.
- Automate operational toil aggressively by writing production-grade tooling.
- Architect and operate large-scale Kubernetes clusters and container-based workloads.
- Design CI/CD pipelines that promote safe, frequent, and observable releases.
- Lead capacity planning and performance engineering activities.
- Partner closely with application development teams to embed reliability practices early in design.
- Strengthen the platform’s resiliency through chaos engineering, fault injection, dependency isolation, retries, timeouts, circuit breakers, and well-tested failover paths.
- Drive continuous improvement of security posture in collaboration with security teams.
- Contribute to the technical roadmap for reliability tooling, observability platforms, and developer-experience improvements.
- Mentor engineers across the organization on SRE practices and foster a strong, blameless culture of operational excellence.
Benefits
- Competitive base salary commensurate with experience, plus benefits.