Key Responsibilities:
- Develop and optimizedata pipelinesusingSparkandDatabricks.
- Write complexSQL queriesto analyze and manipulate large datasets.
- ImplementPython-basedscripts for data processing and automation.
- Design and maintainETL workflowsfor structured and unstructured data.
- Collaborate with cross-functional teams to ensure high-performance data architectures.
- Ensure data quality, governance & security within the pipelines.
Mandatory Skills:
- Strong proficiency inSQL,Python,Spark, andDatabricks.
- Hands-on experience with distributed computing frameworks.
Good-to-Have Skills (Optional):
- Experience withAirflow / Prefectfor workflow orchestration.
- Knowledge ofSnowflakefor cloud data warehousing.
- Experience with designing & building frameworks for data processing and/or data quality
- ExperiencewithAWS / Azure / GCPcloud environments.
- Experience with Data Modeling
- Exposure toKafkafor real-time data streaming.
- Experience with NoSQL databases
- Exposure or Knowledge of Data Visualization tools like Power BI, Google Looker, Tableau, etc.
Preferred Qualifications:
- Bachelor's/Master's degree in Computer Science, Engineering, or related field.
- Strong analytical and problem-solving skills.
- Effective communication and teamwork abilities.