Key Responsibilities:
Data Pipeline & ETL/ELT Development
- Design, build, and maintain scalable data pipelines using Python and Apache Spark via Databricks
- Develop ETL/ELT workflows to ingest, transform, and load structured and unstructured data
- Optimize data workflows for performance, reliability, and scalability in cloud environments
Cloud & Data Integration
- Integrate data solutions with AWS services including S3, Glue, Lambda, DynamoDB, and API Gateway
- Work with Databricks to develop notebooks, jobs, and workflows for data processing and analytics
- Implement API integration for data ingestion and processing
Data Quality & Compliance
- Implement data quality checks, monitoring, and alerting mechanisms
- Ensure compliance with data governance and security standards, including HIPAA and healthcare regulations
Collaboration & Technical Strategy
- Collaborate with data scientists, analysts, and engineering teams to deliver high-quality solutions
- Participate in code reviews, architectural discussions, and contribute to technical strategy