Role Overview
We are looking for experienced Data Engineers with strong expertise in Python, PySpark, and Databricks to support enterprise-scale data ingestion and processing platforms. The ideal candidate should have hands-on experience building scalable data pipelines, orchestration frameworks, and cloud-based data solutions using modern Lakehouse architectures.
The role will primarily focus on:
- Ingest Factory
- Data Processing Factory
- Data Pipeline Orchestration
- Databricks-based ETL/ELT Solutions
Key Responsibilities
- Design, develop, and optimize scalable data ingestion and processing pipelines.
- Build and maintain ETL/ELT workflows using Python, PySpark, and Databricks.
- Develop orchestration workflows using LakeFlow, Jobs, Tasks, and Declarative Pipelines.
- Implement reusable frameworks, metadata-driven orchestration, and automation patterns.
- Ensure data quality, monitoring, alerting, and lineage across data pipelines.
- Collaborate with business and technical teams to understand data requirements and solution design.
- Participate in Agile/SCRUM ceremonies, code reviews, and deployment activities.
- Optimize Spark workloads, SQL queries, and compute performance within Databricks.
- Support CI/CD and Infrastructure-as-Code practices using Git and Terraform.
Required Technical SkillsCore Technologies
- Python
- PySpark
- Databricks
- SQL
- Spark
- Git & CI/CD
- Terraform
Detailed Skill RequirementsDatabricks Expertise
- Strong hands-on experience with:
- Databricks Notebooks
- Jobs & Workload Optimization
- Connectors & Data Acquisition
- LakeFlow Orchestration
- Declarative Pipelines
- Delta Lake
- Experience implementing:
- Data lineage
- Monitoring & alerting frameworks
- Data quality checks
- Data product concepts
Python & PySpark
- Strong understanding of distributed processing using Spark.
- Good knowledge of Python coding standards and package management.
- Ability to differentiate single-node vs distributed Spark execution patterns.
- Experience building scalable PySpark transformations and frameworks.
Data Engineering & Data Literacy
- Strong understanding of:
- Lakehouse and Data Warehouse architectures
- ETL/ELT patterns
- Source system integration patterns
- Data models and schema design
- Experience with data quality validation and monitoring practices.
Software Engineering Practices
- Strong understanding of:
- SOLID principles
- DRY principles
- Reusable framework design
- Metadata-driven orchestration
- Experience with:
- Agile/SCRUM methodologies
- Git workflows and pull requests
- Unit, integration, and end-to-end testing
- VS Code / Cursor IDE tools
Spark & SQL Optimization
- Experience troubleshooting distributed Spark workloads.
- Expertise in:
- Query optimization
- Join optimization
- Compute and table performance tuning
- Efficient filtering and workload reduction strategies
DevOps & Automation
- Hands-on experience with:
- CI/CD pipelines
- Git version control
- Terraform Infrastructure-as-Code (IaC)
- Deployment automation practices
Required Experience
- 5+ years of experience in Cloud Data Engineering.
- Proven experience designing and building production-grade data platforms.
- Experience working independently and collaboratively within Agile teams.