Data Pipeline Development
- Design, develop, and maintain scalable batch and near real-time data pipelines using AWS services.
- Build and manage ETL/ELT workflows that feed data into Amazon Redshift.
- Integrate data from multiple source systems across different business domains.
- Develop efficient data transformation frameworks using Python and SQL.
Data Warehousing & Modeling
- Design and maintain Redshift data models, tables, and views to support analytics workloads.
- Implement dimensional and analytical data models including Star Schema and Fact/Dimension tables.
- Develop scalable and reliable data warehouse architectures supporting business reporting and analysis.
Redshift Optimization & Performance
- Optimize Amazon Redshift performance using distribution styles, sort keys, and query tuning techniques.
- Perform query optimization and database performance tuning to enhance processing efficiency.
- Monitor data warehouse performance and ensure high availability and reliability.
AWS Data Ecosystem Management
- Work extensively within the AWS ecosystem including services such as S3, Glue, Redshift, Lambda, and IAM.
- Support integration and data consumption workflows involving Snowflake for analytics and downstream systems.
- Ensure data solutions are secure, scalable, and cost-efficient.
Data Quality & Governance
- Implement data validation, reconciliation, and monitoring mechanisms to maintain data quality.
- Ensure compliance with data governance and security standards.
- Maintain proper documentation and data lineage for pipelines and transformations.
Production Support & Collaboration
- Provide production support, perform root cause analysis, and resolve data pipeline issues.
- Continuously improve pipeline performance, reliability, and maintainability.
- Collaborate closely with data architects, analysts, and business stakeholders to deliver robust data solutions.