The Sr. Data Engineer will design, build, and operate scalable data pipelines and curated datasets that power analytics products, reporting, and advanced modeling. The role focuses on reliability, performance, data quality, and governance across batch and (where applicable) streaming workloads.
Requirements
- Build and maintain robust ETL/ELT pipelines for ingestion, transformation, and aggregation of large-scale datasets on Hadoop and enterprise data platforms.
- Develop high-performance data processing jobs using PySpark/Spark, Python, and SQL (including engines such as Impala where applicable).
- Implement and automate data quality checks, reconciliation, lineage documentation, and monitoring to ensure trust in downstream analytics and AI use cases.
- Optimize pipeline performance and cost through partitioning, file formats, compute tuning, and efficient query patterns.
- Contribute to CI/CD for data workflows (testing, code reviews, deployment automation), promoting engineering best practices and maintainable codebases.
- Support data governance, privacy, and security requirements (PII handling, access controls, auditability) in collaboration with platform and risk partners.
- Collaborate with data scientists to publish analysis-ready and ML-ready datasets, including feature generation and repeatable data preparation processes.
- Troubleshoot production issues, participate in on-call/operational rotations, and drive root-cause fixes to improve reliability.
- Communicate data platform capabilities, limitations, and trade-offs clearly to technical and non-technical stakeholders.
- Strong problem-solving skills with ability to debug complex distributed data issues independently.
- Clear written and verbal communication with both technical engineers and non-technical business stakeholders.
Benefits
- Generous Paid Time Off
- 401k Matching
- Retirement Plan
- Visa Sponsorship
- Four Day Work Week
- Generous Parental Leave
- Tuition Reimbursement
- Relocation Assistance