Key Responsibilities:
- Lead the design, development, and optimization of scalable and secure data pipelines usingAWS servicessuch asGlue, S3, Lambda, EMR, andDatabricks Notebooks, Jobs, and Workflows.
- Oversee the development and maintenance ofdata lakesonAWS Databricks, ensuring performance and scalability.
- Build and manage robustETL/ELTworkflows usingPythonandSQL, handling both structured and semi-structured data.
- Implement distributed data processing solutions usingApache Spark/PySparkfor large-scale data transformation.
- Collaborate with cross-functional teams including data scientists, analysts, and product managers to ensure data is accurate, accessible, and well-structured.
- Enforce best practices fordata quality, governance, security, andcomplianceacross the entire data ecosystem.
- Monitor system performance, troubleshoot issues, and drive continuous improvements in data infrastructure.
- Conduct code reviews, define coding standards, and promote engineering excellence across the team.
- Mentor and guide junior data engineers, fostering a culture of technical growth and innovation.
QualificationsRequirements
- 8+ years of experience in data engineering with proven leadership in managing data projects and teams.
- Expertise inPython, SQL, Spark (PySpark), and experience withAWSandDatabricksin production environments.
- Strong understanding of modern data architecture, distributed systems, and cloud-native solutions.
- Excellent problem-solving, communication, and collaboration skills.
- Prior experience mentoring team members and contributing to strategic technical decisions is highly desirable.