Key Responsibilities:
Data Pipeline Development
- Design, build, and maintain scalable data pipelines using Python and Apache Spark via Databricks
- Develop ETL/ELT workflows to ingest, transform, and load structured and unstructured data from multiple sources
- Optimize data workflows for performance, reliability, and scalability
Cloud & Integration
- Work with AWS services such as S3, Glue, Lambda, DynamoDB, and API Gateway for integration and processing
- Develop Databricks notebooks, jobs, and workflows for data processing and analytics
- Implement APIs and integrate data solutions with AWS service APIs
Data Quality & Compliance
- Implement data quality checks, monitoring, and alerting mechanisms
- Ensure compliance with data governance and security standards, including HIPAA regulations
Collaboration & Technical Strategy
- Collaborate with data scientists, analysts, and engineering teams to deliver high-quality data solutions
- Participate in code reviews, architectural discussions, and contribute to technical strategy