The Sr. Data Engineer will design, build, and operate scalable data pipelines and curated datasets that power analytics products, reporting, and advanced modeling. The role focuses on reliability, performance, data quality, and governance across batch and (where applicable) streaming workloads.

Requirements

Build and maintain robust ETL/ELT pipelines for ingestion, transformation, and aggregation of large-scale datasets on Hadoop and enterprise data platforms.
Develop high-performance data processing jobs using PySpark/Spark, Python, and SQL (including engines such as Impala where applicable).
Implement and automate data quality checks, reconciliation, lineage documentation, and monitoring to ensure trust in downstream analytics and AI use cases.
Optimize pipeline performance and cost through partitioning, file formats, compute tuning, and efficient query patterns.
Contribute to CI/CD for data workflows (testing, code reviews, deployment automation), promoting engineering best practices and maintainable codebases.
Support data governance, privacy, and security requirements (PII handling, access controls, auditability) in collaboration with platform and risk partners.
Collaborate with data scientists to publish analysis-ready and ML-ready datasets, including feature generation and repeatable data preparation processes.
Troubleshoot production issues, participate in on-call/operational rotations, and drive root-cause fixes to improve reliability.
Communicate data platform capabilities, limitations, and trade-offs clearly to technical and non-technical stakeholders.
Strong problem-solving skills with ability to debug complex distributed data issues independently.
Clear written and verbal communication with both technical engineers and non-technical business stakeholders.

Benefits

Generous Paid Time Off
401k Matching
Retirement Plan
Visa Sponsorship
Four Day Work Week
Generous Parental Leave
Tuition Reimbursement
Relocation Assistance

Requirements

Build and maintain robust ETL/ELT pipelines for ingestion, transformation, and aggregation of large-scale datasets on Hadoop and enterprise data platforms.

Develop high-performance data processing jobs using PySpark/Spark, Python, and SQL (including engines such as Impala where applicable).

Implement and automate data quality checks, reconciliation, lineage documentation, and monitoring to ensure trust in downstream analytics and AI use cases.

Optimize pipeline performance and cost through partitioning, file formats, compute tuning, and efficient query patterns.

Contribute to CI/CD for data workflows (testing, code reviews, deployment automation), promoting engineering best practices and maintainable codebases.

Support data governance, privacy, and security requirements (PII handling, access controls, auditability) in collaboration with platform and risk partners.

Collaborate with data scientists to publish analysis-ready and ML-ready datasets, including feature generation and repeatable data preparation processes.

Troubleshoot production issues, participate in on-call/operational rotations, and drive root-cause fixes to improve reliability.

Communicate data platform capabilities, limitations, and trade-offs clearly to technical and non-technical stakeholders.

Strong problem-solving skills with ability to debug complex distributed data issues independently.

Clear written and verbal communication with both technical engineers and non-technical business stakeholders.

Sr. Data Engineer (Big Data & Analytics Engineering)

About the role

Requirements

Benefits

Similar jobs

Products

Use Cases

Insights

Resources

Browse Jobs

Company

Sr. Data Engineer (Big Data & Analytics Engineering)

About the role

Requirements

Benefits

Similar jobs

About Mastercard

Mastercard