As a Site Reliability Engineer at Point72, you will design and implement automated operational workflows, build observability solutions, and partner with development teams to improve application reliability and deployment safety.
Requirements
- Strong hands-on experience with Linux and Windows operating systems
- Proven experience building automation and tooling using Python or similar languages
- Deep understanding of observability and monitoring, preferably with Datadog
- Experience with CI/CD pipelines and deployment automation (Bitbucket, GitHub Actions, Jenkins, etc.)
- Operational and performance knowledge of SQL Server and MongoDB
- Familiarity with cloud platforms (AWS or similar) and hybrid architectures
- Solid understanding of networking concepts such as DNS, load balancing, and TCP/IP
- Experience working closely with application development teams in an SRE or DevOps role
- Experience with Kubernetes, OpenShift, and containerized workloads
- Knowledge of infrastructure-as-code tools (Terraform, CloudFormation, ARM)
- Experience implementing automated scaling and performance tuning
- Background in reliability engineering or DevOps in an enterprise environment
- Familiarity with security and compliance considerations in production systems
- Strong bias toward automation over manual processes
- Focus on improving long-term reliability rather than reactive firefighting
- Comfortable owning systems end-to-end and driving improvements
- Clear communication skills with the ability to work effectively across engineering, platform, and operations teams
- Commitment to the highest ethical standards
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Visa Sponsorship
- Four Day Work Week
- Generous Parental Leave
- Tuition Reimbursement
- Relocation Assistance