Dexcom Corporation seeks a Senior Site Reliability Engineer to architect, build, and operate the resilient, scalable, and secure cloud infrastructure that powers our R&D Platform serving millions of Customers every day.
Requirements
- Architect and evolve Dexcom’s observability ecosystem, defining standards for metrics, logging, tracing, and SLO/SLA-driven reliability.
- Design, build, and operate highly available cloud infrastructure on Google Cloud Platform (GCP), focusing on performance, scalability, and security.
- Lead Kubernetes platform operations, improving cluster reliability, multi-tenant architecture, and deployment patterns.
- Diagnose and resolve complex failures across cloud infrastructure, CI/CD pipelines, policy engines, and microservices.
- Set the direction for Infrastructure as Code (IaC), defining best practices with Terraform, Pulumi, or Crossplane for automated provisioning.
- Drive automation strategy to eliminate toil, build self-service capabilities, and operationalize guardrails for compliance and cost efficiency.
- Lead major incident response and conduct deep post-incident reviews to implement remediations that prevent recurring failure categories.
- Mentor engineers and influence cross-functional practices to help teams adopt operational discipline and cloud-native best practices.
- Partner with developer teams to optimize capacity strategies and ensure the seamless delivery of high-quality solutions.
Benefits
- Full and comprehensive benefits program
- Growth opportunities on a global scale
- Access to career development through in-house learning programs and/or qualified tuition reimbursement
- An exciting and innovative, industry-leading organization committed to our employees, customers, and the communities we serve