We are seeking a Lead Platform Engineer who is deeply hands-on, highly accountable, and a recognized technical role model for platform and DevOps engineers.
Requirements
- Design, build, and own scalable, secure platform infrastructure on AWS
- Define and standardize platform patterns for computer, networking, deployments, and observability
- Build, enhance, and architect CI/CD pipelines using GitHub, GitLab, Jenkins, and Azure DevOps
- Lead Infrastructure-as-Code practices using Terraform, including modules, state management, conventions, and governance
- Own container orchestration platforms (Docker, ECS/EKS), including networking, upgrades, resilience, and reliability
- Design and develop automation-first solutions using Python, Shell, PowerShell, or Go
- Build internal tooling for deployments, operational housekeeping, diagnostics, and self-service capabilities
- Drive clean, modular, and reusable automation patterns that can be adopted and extended across teams
- Own observability standards using Prometheus, Grafana, Splunk, ELK, Datadog, and CloudWatch
- Define and enforce SLIs, SLOs, alert thresholds, and alert noise reduction strategies
- Ensure deep visibility into system health, performance bottlenecks, and failure modes
- Independently lead P1/P2 production incidents including triage, mitigation, decision-making, and recovery
- Act as the technical leader during incidents, not merely a coordinator
- Collaborate closely with application, QA, security, and product teams to ensure stable environments and smooth releases
- Clearly communicate risks, trade-offs, architectural decisions, and incident status to both technical and non-technical stakeholders
- Perform capacity planning, cost optimization, and performance tuning across platforms
- Lead proofs of concept for new DevOps, platform, and automation technologies
- Advocate continuous improvement in platform reliability, security posture, and engineering maturity
- Awareness of AIOps concepts such as anomaly detection, alert correlation, and predictive insights
- Exposure to MLOps pipelines including model deployment and containerized ML workloads
Benefits
- Health & Wellness
- Flexible Downtime
- Continuous Learning
- Invest in Your Future
- Family Friendly Perks
- Beyond the Basics