Bright Vision Technologies is a forward-thinking software development company dedicated to building innovative solutions that help businesses automate and optimize their operations. We're looking for a skilled Observability Engineer (Prometheus / Grafana / Datadog) to join our dynamic team and contribute to our mission of transforming business processes through technology.
Requirements
- Design and operate enterprise-grade observability platforms covering metrics, logs, traces, events, and synthetic monitoring.
- Architect Prometheus / Thanos / Mimir, Grafana, Loki, Tempo, OpenTelemetry, and Datadog deployments for high availability and scale.
- Develop standards for service instrumentation, including OpenTelemetry adoption, metric naming, label cardinality, and structured logging conventions.
- Define and enforce SLOs, SLIs, and error budgets, and build the dashboards and alerts that operationalize them.
- Build alerting strategies that minimize noise, surface actionable signals, and integrate cleanly with on-call workflows in PagerDuty, Opsgenie, or similar tools.
- Operate large-scale time-series and log storage platforms, balancing retention, query performance, and cost.
- Design distributed tracing pipelines and help teams use traces to diagnose latency and reliability issues.
- Develop self-service tooling, paved-road libraries, and templates that make adoption of observability standards easy for product teams.
- Drive cost management and label-cardinality discipline across the observability estate.
- Lead incident response readiness improvements through better dashboards, alerting hygiene, and post-incident analysis tooling.
- Partner with SRE and platform teams to integrate observability into deployment pipelines, canary analysis, and progressive delivery workflows.
- Evaluate and recommend observability vendors and open-source tools based on cost, capability, and operational maturity.
- Mentor engineering teams on observability fundamentals, debugging techniques, and SLO-driven operations.
- Maintain documentation, onboarding guides, and runbooks for the observability platform.
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Health Insurance