AlphaSense is seeking a Staff Site Reliability Engineer to help shape the future of reliability, scalability, and performance at AlphaSense. The role involves architecting core reliability platforms, leading by example in incident response, and driving cultural adoption of SRE best practices across the global engineering organization.
Requirements
- 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role, with at least 3+ of those years operating in a Senior+ SRE position
- Strong background in running production SaaS systems at scale
- Proficiency in at least one programming/scripting language (Python, Go, or similar)
- Hands-on expertise with cloud platforms (AWS, GCP, or Azure) and Kubernetes
- Deep understanding of networking fundamentals (TCP/IP, DNS, HTTP/S, load balancing)
- Experience with monitoring & alerting (Prometheus, Grafana, Datadog, ELK)
- Familiarity with advanced observability (OTEL, continuous profiling)
- Proven incident management experience, including leading high-severity incidents and postmortems
- Strong troubleshooting skills across the full stack
- Excellent communication and collaboration skills