The role is responsible for designing, building, maintaining, analyzing, and interpreting data to provide actionable insights that drive business decisions.
Requirements
- Design, develop, and maintain data solutions for data generation, collection, and processing
- Be a key team member that assists in design and development of the data pipeline
- Create data pipelines and ensure data quality by implementing ETL processes to migrate and deploy data across systems
- Take ownership of data pipeline projects from inception to deployment, manage scope, timelines, and risks
- Collaborate with cross-functional teams to understand data requirements and design solutions that meet business needs
- Develop and maintain data models, data dictionaries, and other documentation to ensure data accuracy and consistency
- Implement data security and privacy measures to protect sensitive data
- Leverage cloud platforms (AWS preferred) to build scalable and efficient data solutions
- Collaborate and communicate effectively with product teams
- Collaborate with Data Architects, Business SMEs, and Data Scientists to design and develop end-to-end data pipelines to meet fast-paced business needs across geographic regions
- Identify and resolve complex data-related challenges
- Adhere to best practices for coding, testing, and designing reusable code/component
- Explore new tools and technologies that will help to improve ETL platform performance
- Participate in sprint planning meetings and provide estimations on technical implementation
- Design and develop data pipelines leveraging Databricks, PySpark, and SQL to ingest, transform, and process large-scale datasets
- Engineer solutions for both structured and unstructured data to enable advanced analytics and insights
- Implement automated workflows for data ingestion, transformation, and deployment using Databricks Jobs and notebooks, with ongoing monitoring and scheduling
- Apply performance optimization techniques, including Spark job tuning, caching, partitioning, and indexing, to improve scalability and efficiency
- Build integrations with multiple data sources, such as SQL databases, APIs, and cloud storage platforms, ensuring seamless connectivity and reliability
- Collaborate effectively with global teams across time zones to maintain alignment, resolve issues, and deliver on shared objectives
Benefits
- Health insurance
- Retirement plan
- Paid time off