Key Responsibilities:
Data Architecture & Engineering:
- Design and implement scalable data architectures leveraging BigQuery, Iceberg, Starburst, and Trino.
- Develop high-performance ETL/ELT pipelines for structured and unstructured data.
- Optimize SQL queries and workflows for efficient analytics and reporting.
Cloud & Big Data Infrastructure:
- Build and maintain cloud-based data pipelines and storage solutions using Google Cloud Platform (GCP) and BigQuery.
- Implement best practices for data governance, security, and compliance.
- Optimize ingestion, storage, and query performance for high-volume datasets.
Data Processing & Analytics:
- Leverage Apache Iceberg for large-scale data lake management and transactional processing.
- Utilize Starburst and Trino for distributed query processing and federated data access.
- Develop strategies for data partitioning, indexing, and caching to enhance performance.
Collaboration & Integration:
- Work with data scientists, analysts, and business stakeholders to understand data needs.
- Collaborate with DevOps and platform engineering teams to implement CI/CD pipelines and infrastructure-as-code for data workflows.
- Integrate data from multiple sources, ensuring accuracy and consistency.
Performance Optimization & Monitoring:
- Monitor, troubleshoot, and optimize data pipelines for efficiency, scalability, and reliability.
- Implement data quality frameworks and automated validation checks.