This role supports the U.S. Air Force Cloud One Architecture and Common Shared Services contract and currently has an opening for a Reliability Engineer. The Reliability Engineer is responsible for ensuring the availability, performance, scalability, and resiliency of mission-critical systems.
Requirements
- Bachelors and eight (8) years or more of experience; Masters and six (6) years or more of experience
- Active Secret clearance at a minimum required to start
- US citizenship required
- Experience with cloud platforms (AWS, Azure, OCI, or GCP), including managed services
- Experience with containerized environments (Docker, Kubernetes)
- Familiarity with CI/CD pipelines and deployment automation
- SLOs and error budgets
- Capacity modeling and performance testing
- Strong understanding of distributed systems and high-availability architectures
- Linux/Windows system administration
- Networking fundamentals (DNS, TCP/IP, load balancing)
- Hands-on experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK/Elastic, Datadog, Azure Monitor)
- Infrastructure as Code (Terraform, ARM, CloudFormation)
- Scripting or programming languages (Python, Bash, Go, PowerShell, or similar)
- Experience supporting incident management and on-call operations
Benefits
- Medical
- Dental
- Vision
- AD&D
- STD
- LTD
- Company paid Life Insurance
- 401k with employer contribution
- Paid Time Off
- Pet Insurance