The Senior AI Data Platform Engineer will design and build streaming-first data pipelines, own and extend the ML Attribute Store, build MCP-compatible Agent Data APIs, and develop agentic framework for autonomous AI agents. The role requires 6+ years of experience in data platform engineering, distributed systems, or backend infrastructure at scale, and a strong experience with Apache Spark, Databricks, Delta Lake, or equivalent lakehouse technologies.
Requirements
- 6+ years of experience in data platform engineering, distributed systems, or backend infrastructure at scale.
- Deep hands-on experience with Apache Spark, Databricks, Delta Lake, or equivalent lakehouse technologies (Iceberg, Hudi).
- Proven track record building and operating large-scale pipelines processing billions of events daily with sub-hour latency SLAs.
- Strong experience with streaming systems: Kafka, Kinesis, Flink, Spark Structured Streaming, or Delta Live Tables.
- Proficiency in Python and/or Scala; SQL fluency required. Java or Go is a plus.
- Experience with cloud platforms (AWS or Azure), containerization (Docker, Kubernetes), and CI/CD for data pipelines.
- Production experience integrating LLMs into engineering workflows — not prototypes, but systems running against real data with real users.
- Hands-on experience with agentic AI frameworks and multi-agent orchestration (LangChain, LangGraph, CrewAI, AutoGen, or custom agent loops with memory, planning, and tool routing).
- Understanding of MCP (Model Context Protocol) and/or A2A protocols for exposing platform capabilities as agent-consumable tool servers — or demonstrable ability to build equivalent agent-tool integration surfaces.
- Experience building or operating ML Feature Stores (online and/or offline), including training-serving skew mitigation, feature freshness trade-offs, and real-time feature computation.
- Familiarity with RAG architectures: embedding generation, vector databases (FAISS, Pinecone, Weaviate, Databricks Vector Search), document chunking strategies, and retrieval evaluation.
- Exposure to semantic layers, knowledge graphs, or metadata-driven data discovery systems (Unity Catalog, DataHub, OpenMetadata) that enable agents to autonomously navigate enterprise data catalogs.
- Ability to build evaluation and feedback pipelines for AI systems — measuring agent accuracy, latency, cost attribution per workflow, and reliability at scale.
- Demonstrated use of AI-powered developer tools (Claude Code, Cursor, GitHub Copilot, or similar) to accelerate engineering velocity.
- Agentic-first instinct: you default to “can an agent do this?” before reaching for manual solutions, scripts, or traditional automation.
- Challenger mentality: you question inherited architecture, push back on “we’ve always done it this way,” and drive fast improvement through first-principles thinking.
- Extreme bias for action and time-to-market: you ship iteratively, prefer “good enough now” over “perfect later,” and unblock yourself.
Benefits
- 401k Matching
- Generous Paid Time Off
- Retirement Plan
- Tuition Reimbursement