Key Responsibilities:
Data Pipeline Development
- Develop and maintain data pipelines using Databricks, Python, and Apache Spark
- Implement scalable batch and real-time data processing solutions
AWS Cloud Integration
- Leverage AWS services including S3, Lambda, Step Functions, EC2, RDS, Glue, SQS, Redshift, SNS, CloudWatch, and CloudTrail
- Optimize for data security, performance, and cost efficiency
Data Analysis & Support
- Work with SQL for data analysis, transformation, and validation
- Support BI and analytics teams to deliver data-driven solutions
Monitoring & Documentation
- Monitor and troubleshoot pipelines using CloudWatch and CloudTrail
- Document technical designs, workflows, and implementation details