CoreWeave is seeking a Senior Site Reliability Engineer, Data Infrastructure to own the reliability and performance of their Kubernetes-based data platform. The role involves designing and operating highly available, multi-region systems, ensuring strict uptime and latency targets. The team builds and operates the foundational systems that power data ingestion, transformation, analytics, and internal AI workloads at scale.

Requirements

5+ years of experience in Site Reliability Engineering, Platform Engineering, or Infrastructure Engineering roles
Deep expertise in Kubernetes and containerized software services, including cluster design, operations, and troubleshooting in production environments
Strong experience building and operating CI/CD systems, including tools such as Argo CD and GitHub Actions
Proven experience owning production systems with high availability requirements (≥99.99% uptime), including incident response, SLI/SLO/SLA definition, error budgets, and postmortems
Hands-on experience designing and operating geo-replicated, multi-region, active-active systems, including traffic routing, failover strategies, and data consistency tradeoffs
Strong experience building and owning observability components, including metrics, logging, and tracing (e.g., Prometheus, Grafana, OpenTelemetry).

Benefits

Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Ability to Participate in Employee Stock Purchase Program (ESPP)
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations

Requirements

5+ years of experience in Site Reliability Engineering, Platform Engineering, or Infrastructure Engineering roles

Deep expertise in Kubernetes and containerized software services, including cluster design, operations, and troubleshooting in production environments

Strong experience building and operating CI/CD systems, including tools such as Argo CD and GitHub Actions

Proven experience owning production systems with high availability requirements (≥99.99% uptime), including incident response, SLI/SLO/SLA definition, error budgets, and postmortems

Hands-on experience designing and operating geo-replicated, multi-region, active-active systems, including traffic routing, failover strategies, and data consistency tradeoffs

Strong experience building and owning observability components, including metrics, logging, and tracing (e.g., Prometheus, Grafana, OpenTelemetry).

Benefits

Medical, dental, and vision insurance - 100% paid for by CoreWeave

Company-paid Life Insurance

Voluntary supplemental life insurance

Short and long-term disability insurance

Flexible Spending Account

Health Savings Account

Tuition Reimbursement

Ability to Participate in Employee Stock Purchase Program (ESPP)

Mental Wellness Benefits through Spring Health

Family-Forming support provided by Carrot

Paid Parental Leave

Flexible, full-service childcare support with Kinside

401(k) with a generous employer match

Flexible PTO

Catered lunch each day in our office and data center locations

Senior Site Reliability Engineer, Data Infrastructure

About the role

Requirements

Benefits

Similar jobs

Products

Use Cases

Insights

Resources

Browse Jobs

Company

Senior Site Reliability Engineer, Data Infrastructure

About the role

Requirements

Benefits

Similar jobs

About CoreWeave

CoreWeave