Key Responsibilities
Data Pipeline Development
- Design, develop, and maintain scalable data pipelines and ETL processes.
- Implement complex data transformations using Python and PySpark.
- Work with large-scale structured and unstructured datasets.
Database & Query Optimization
- Write optimized SQL and PL-SQL queries for data extraction, transformation, and validation.
- Ensure performance tuning and query optimization for high-volume data workloads.
Data Quality & Performance
- Ensure data accuracy, consistency, and reliability across workflows.
- Implement validation checks and monitoring mechanisms.
Collaboration & Support
- Collaborate with analytics, BI, and application teams for downstream data requirements.
- Support production deployments, monitoring, and incident resolution.
- Maintain documentation and promote data engineering best practices.