Yelp is seeking a Site Reliability Engineer to design, deploy, and maintain large-scale Kafka event streaming infrastructure across hybrid and multi-cloud environments. The ideal candidate will have strong hands-on experience designing and implementing large-scale Kafka event streaming capabilities in production.
Requirements
- Strong hands-on experience designing and implementing large-scale Kafka event streaming capabilities in production
- In-depth knowledge of event streaming/data-in-motion design principles, architecture, and operational nuances
- Programming proficiency in Java, Python, or similar modern languages for tooling, integration, and automation
- Familiarity with Kafka Client APIs (Producer, Consumer, Streams), as well as sizing and capacity planning for high-throughput clusters
- Experience designing and optimizing real-time data streaming solutions with technologies like Apache Flink
- Knowledge of automating infrastructure and operational tasks (configuration management, IaC, scripting, or related)
- Problem-solving mindset with an eagerness to learn, take initiative, and advocate for infrastructure best practices in a fast-paced environment
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Flexible work arrangements
- Professional development opportunities