Allata is a global consulting and technology services firm looking for a Data Manager with strong experience in enterprise data platform architecture and governance to lead client-facing data platform implementations. The role blends high-impact architectural responsibilities with technical leadership in designing, building, deploying, and optimizing data pipelines and data products on Lakehouse/EDW platforms (with an emphasis on Databricks).
Requirements
- Define the overall data platform architecture (Lakehouse/EDW), including reference patterns (Medallion, Lambda, Kappa), technology selection, and integration blueprint.
- Design conceptual, logical, and physical data models to support multi-tenant and vertical-specific data products; standardize logical layers (ingest/raw, staged/curated, serving).
- Establish data governance, metadata, cataloging (e.g., Unity Catalog), lineage, data contracts, and classification practices to support analytics and ML use cases.
- Define security and compliance controls: access management (RBAC/IAM), data masking, encryption (in transit/at rest), network segmentation, and audit policies.
- Architect scalability, high availability, disaster recovery (RPO/RTO), and capacity & cost management strategies for cloud and hybrid deployments.
- Lead selection and integration of platform components (Databricks, Delta Lake, Delta Live Tables, Fivetran, Azure Data Factory / Data Fabric, orchestration, monitoring/observability).
- Design and enforce CI/CD patterns for data artifacts (notebooks, packages, infra-as-code), including testing, automated deployments and rollback strategies.
- Define ingestion patterns (batch & streaming), file compacting/compaction strategies, partitioning schemes, and storage layout to optimize IO and costs.
- Specify observability practices: metrics, SLAs, health dashboards, structured logging, tracing, and alerting for pipelines and jobs.
- Act as technical authority and mentor for Data Engineering teams; perform architecture and code reviews for critical components.
- Collaborate with stakeholders (Data Product Owners, Security, Infrastructure, BI, ML) to translate business requirements into technical solutions and roadmap.
- Design, develop, test, and deploy processing modules using Spark (PySpark/Scala), Spark SQL, and database stored procedures where applicable.
- Build and optimize data pipelines on Databricks and complementary engines (SQL Server, Azure SQL, AWS RDS/Aurora, PostgreSQL, Oracle).
- Implement DevOps practices: infra-as-code, CI/CD pipelines (ingestion, transformation, tests, deployment), automated testing and version control.
- Troubleshoot and resolve complex data quality, performance, and availability issues; recommend and implement continuous improvements.
Benefits
- Equal Opportunity Employer
- Diverse and Inclusive Work Environment
- Flexible Working Arrangements