Responsibilities:
- Design, develop, and optimize scalable data pipelines using Databricks inAWSsuch as Glue, S3, Lambda, EMR, Databricks notebooks, workflows and jobs.
- Building data lake in WS Databricks.
- Build and maintain robustETL/ELT workflowsusingPythonandSQLto handle structured and semi-structured data.
- Develop distributed data processing solutions usingApache SparkorPySpark.
- Partner with data scientists and analysts to provide high-quality, accessible, and well-structured data.
- Ensure data quality, governance, security, and compliance across pipelines and data stores.
- Monitor, troubleshoot, and improve the performance of data systems and pipelines.
- Participate in code reviews and help establish engineering best practices.
- Mentor junior data engineers and support their technical development.
QualificationsRequirements
- Bachelors or masters degree in computer science, Engineering, or a related field.
- 5+ years of hands-on experience indata engineering, with at least 2 years working withAWSDatabricks.
- Strong programming skills inPythonfor data processing and automation.
- Advanced proficiency inSQLfor querying and transforming large datasets.
- Deep experience withApache Spark/PySparkin a distributed computing environment.
- Solid understanding of data modelling, warehousing, and performance optimization techniques.
- Proficiency with AWS services such asGlue,S3,LambdaandEMR.
- Experience with version control Git or Code commit