At DraftKings, we're seeking a Lead Site Reliability Engineer to drive key initiatives to enhance the reliability, scalability, and efficiency of our infrastructure. As a Lead SRE, you will collaborate across teams to architect infrastructure automation, mentor other engineers, and foster a culture of continuous learning and innovation.
Requirements
- At least 6 years of experience managing distributed cloud environments (GCP, AWS, vSphere, Nutanix)
- Deep expertise in container orchestration (Kubernetes) and container runtimes (Docker, containers)
- Expert-level understanding of networking and web concepts, with the ability to debug issues down to the packet level
- Strong experience developing software for automation and infrastructure tooling (Go, Python)
- Strong understanding of Linux-based operating systems, including performance tuning, bootloaders, storage, partitioning, kernel debugging, and low-level system optimizations
- Experience with Infrastructure as Code (IaC) and configuration management tools (Terraform, Ansible, Chef, etc.)
- Understanding of applications written in various programming languages (C#/.NET, Java, Elixir, Ruby, etc)
- Experience in AWS Greengrass IoT management and A/B booting
Benefits