Role Overview
Pyramid Systems is looking for a Data Engineer (Senior) who is passionate about bringing creative architect solutions to end customers. The role involves planning, creating, and maintaining data architectures, ensuring alignment with business requirements. The ideal candidate will have 8+ years of IT experience focusing on enterprise data architecture and management.
What You Will Do
Plan, create, and maintain data architectures, obtain data, formulate dataset processes, and store optimized data. Identify problems and inefficiencies and apply solutions. Determine tasks where manual participation can be eliminated with automation.
Why It Might Be a Fit
The ideal candidate will have expertise in Spark/Python/Databricks, Data Lake and SQL, and experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables. The role offers competitive compensation, benefits, and learning and development opportunities.
Requirements
- 8+ years of IT experience focusing on enterprise data architecture and management
- Must be able to obtain a Public Trust security clearance
- MUST BE US CITIZEN
- Bachelor degree required
- Experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling
- Experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required
- Additional experience with Spark, Spark SQL, Spark DataFrames and DataSets, and PySpark
- Data Lake concepts such as time travel and schema evolution and optimization
- Structured Streaming and Delta Live Tables with Databricks a bonus
- Experience leading and architecting enterprise-wide initiatives specifically system integration, data migration, transformation, data warehouse build, data mart build, and data lakes implementation / support
- Advanced level understanding of streaming data pipelines and how they differ from batch systems
- Formalize concepts of how to handle late data, defining windows, and data freshness
- Advanced understanding of ETL and ELT and ETL/ELT tools such as SSIS, Pentaho, Data Migration Service etc
- Understanding of concepts and implementation strategies for different incremental data loads such as tumbling window, sliding window, high watermark, etc.
- Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonus
- Understanding of streaming data pipelines and batch systems
- Familiarity with concepts such as late data, defining windows, and how window definitions impact data freshness
- Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization)
- Indexing and partitioning strategy experience
- Debug, troubleshoot, design and implement solutions to complex technical issues
- Experience with large-scale, high-performance enterprise big data application deployment and solution
- Understanding how to create DAGs to define workflows
- Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required
- Architecture experience in AWS environment a bonus
- Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus
- Experience with Docker, Jenkins, and CloudWatch
- Ability to write and maintain Jenkinsfiles for supporting CI/CD pipelines
- Experience working with AWS Lambdas for configuration and optimization
- Experience working with DynamoDB to query and write data
- Experience with S3
- Knowledge of Python (Python 3 desired) for CI/CD pipelines a bonus
- Familiarity with Pytest and Unittest a bonus
- Experience working with JSON and defining JSON Schemas a bonus
- Experience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus
- Familiarity with Schema Registry, message formats such as Avro, ORC, etc.
- Understanding how to manage ksqlDB SQL files and migrations and Kafka Streams
- Ability to thrive in a team-based environment
- Experience briefing the benefits and constraints of technology solutions to technology partners, stakeholders, team members, and senior level of management
Benefits
- Employee Stock Ownership Program
- FlexPTO
- learning and development opportunities