The Observability & Operations Engineer will design and implement a comprehensive observability strategy across all AWS environments, leveraging AI-powered tools to detect anomalies and surface insights automatically. They will build and manage monitoring platforms, use AI coding assistants to accelerate development, and own the incident management lifecycle.

Requirements

Minimum 7-10 years of experience in Software Engineering, Cloud Operations, or Site Reliability Engineering
5+ years of hands-on experience with AWS infrastructure and AWS PaaS services; certifications are a plus
Demonstrated experience building repeatable, code-first pipelines and treating operational configuration as first-class software
Experience working with polyglot environments including Java, Kotlin, and Node.js
Demonstrated experience using AI tools in a professional setting
Deep experience with enterprise observability platforms
Proficiency with distributed tracing frameworks and log management platforms
Strong understanding of SRE principles including SLOs, SLAs, error budgets, and chaos engineering
Hands-on FinOps experience
Strong working knowledge of AWS PaaS services
Experience instrumenting polyglot applications and cloud-native microservices for observability
Proven ability to build repeatable, code-first pipelines
Experience with CI/CD tooling, specifically Harness
Solid understanding of Infrastructure as Code using Terraform
Fluency with AI tools in day-to-day work
Ability to lead incident response, facilitate blameless post-mortems, and drive long-term reliability improvements
Strong collaboration skills for working across platform and product engineering teams
Knowledge of containerization technologies and microservices architecture

Benefits

Generous Paid Time Off
401k Matching
Retirement Plan
Health Insurance

Requirements

Minimum 7-10 years of experience in Software Engineering, Cloud Operations, or Site Reliability Engineering
5+ years of hands-on experience with AWS infrastructure and AWS PaaS services; certifications are a plus
Demonstrated experience building repeatable, code-first pipelines and treating operational configuration as first-class software
Experience working with polyglot environments including Java, Kotlin, and Node.js
Demonstrated experience using AI tools in a professional setting
Deep experience with enterprise observability platforms
Proficiency with distributed tracing frameworks and log management platforms
Strong understanding of SRE principles including SLOs, SLAs, error budgets, and chaos engineering
Hands-on FinOps experience
Strong working knowledge of AWS PaaS services
Experience instrumenting polyglot applications and cloud-native microservices for observability
Proven ability to build repeatable, code-first pipelines
Experience with CI/CD tooling, specifically Harness
Solid understanding of Infrastructure as Code using Terraform
Fluency with AI tools in day-to-day work
Ability to lead incident response, facilitate blameless post-mortems, and drive long-term reliability improvements
Strong collaboration skills for working across platform and product engineering teams
Knowledge of containerization technologies and microservices architecture

Benefits

Generous Paid Time Off
401k Matching
Retirement Plan
Health Insurance

Observability & Operations Engineer

About the role

Requirements

Benefits

Products

Use Cases

Insights

Resources

Browse Jobs

Company

Observability & Operations Engineer

About the role

Requirements

Benefits

About Fullbay

Fullbay