Key Responsibilities:
Technical Leadership
- Lead and mentor a team of data engineers, fostering best practices in coding, design, and delivery.
- Drive adoption of modern data engineering frameworks, tools, and methodologies.
- Translate complex business requirements into data pipelines, architectures, and workflows.
Data Pipeline Development
- Architect, develop, and optimize scalable ETL/ELT pipelines using Apache Spark, Hive, AWS Glue, and Trino.
- Handle complex data workflows involving structured and unstructured data.
- Develop real-time and batch processing systems supporting BI, analytics, and ML applications.
Cloud & Infrastructure Management
- Build and maintain cloud-based data solutions with AWS services like S3, Athena, Redshift, EMR, DynamoDB, and Lambda.
- Design federated query capabilities using Trino.
- Manage Hive Metastore for schema and metadata in data lakes.
Performance Optimization
- Optimize Apache Spark jobs and Hive queries for performance and resource efficiency.
- Implement caching and indexing strategies in Trino.
- Monitor and tune system performance continuously.
Collaboration & Stakeholder Engagement
- Work closely with data scientists, analysts, and business teams to deliver actionable insights.
- Ensure data infrastructure aligns with organizational goals and compliance standards.
Data Governance & Quality
- Establish and enforce data quality standards, governance practices, and monitoring.
- Ensure data security, privacy, and regulatory compliance.
Innovation & Continuous Learning
- Stay updated on industry trends and emerging data engineering technologies.
- Identify and implement improvements in data architecture and processes.