The Senior Cloud Data Engineer will be responsible for operating and optimizing our cloud-based data processing environment, working with Databricks, AWS services, Spark, Unity Catalog, and Delta Lake to ensure efficient, secure, and reliable data pipelines and analytics workloads.
Requirements
- Refines data transformations using PySpark and Spark SQL within notebooks such as Databricks enabling efficient processing of large-scale datasets.
- Leverages orchestration tools like Apache Airflow to automate, schedule, and oversees data workflows for consistent and reliable execution.
- Participates in code reviews, testing, and documentation as part of the development lifecycle.
- Supports and troubleshoots Databricks jobs, Spark workloads, and AWS-based data processes across DEV/QA/Production.
- Optimizes Databricks clusters and jobs for performance and cost.
- Maintains and improves existing data pipelines built with AWS (CodePipeline), Delta Lake, and Databricks Notebooks.
- Works closely with data engineering and analytics teams to improve data quality and pipeline reliability.
- Maintains and enhances CI/CD workflows for Databricks deployments using AWS tools.
- Manages access controls with IAM and Unity Catalog, ensuring secure and compliant data usage.
- Performs regular monitoring, troubleshooting, and root-cause analysis of data and compute workloads.
Benefits
- Health insurance
- Retirement plan
- Generous paid time off
- Tuition reimbursement
- Other benefits (not specified)