About the Role
We are looking for a hands-on Data Engineer (4-8 years) to build and optimize scalable data pipelines and analytical datasets on the Databricks platform. You will work closely with Analytics/BI, Product, and Business teams to enable data-driven decision-making. Retail / eCommerce domain exposure is a strong plus, along with the ability to translate business needs into reliable and performant data solutions.
Key Responsibilities
- Design, develop, and maintain robust ETL/ELT pipelines using Databricks (Spark) and Python (PySpark).
- Develop and optimize complex transformations using SQL (joins, window functions, CTEs, query tuning).
- Build curated datasets and data models to support reporting, dashboards, and advanced analytics use cases.
- Implement pipeline reliability best practices: data quality checks, monitoring, alerting, and reconciliation.
- Optimize Databricks workloads for performance and cost (cluster sizing, partitioning strategies, caching, file formats).
- Work with structured and semi-structured data (JSON, CSV, Parquet/Delta) and handle schema evolution.
- Collaborate with stakeholders to understand business KPIs and deliver data solutions aligned to retail/eCommerce metrics (sales, orders, returns, inventory, customer cohorts).
- Follow engineering best practices for version control (Git), documentation, reusable code patterns, and testing.
- Good to have: Support or migrate Alteryx workflows into Python/Databricks pipelines.
Must-Have Skills & Qualifications
- 4-8 years of experience in Data Engineering / Data Warehousing / Big Data.
- Strong hands-on experience with Databricks (Jobs/Workflows, notebooks, cluster concepts, Spark tuning fundamentals).
- Strong programming skills in Python (PySpark preferred).
- Excellent SQL skills, including performance tuning and writing complex analytical queries.
- Experience building scalable pipelines and working with large datasets in distributed environments.
- Strong understanding of data engineering concepts: ETL/ELT, orchestration, data validation, and observability.
- Familiarity with modern data storage formats and practices (Delta/Parquet, partitioning, incremental loads).
Good-to-Have Skills
- Retail / eCommerce domain knowledge (customer behavior, funnel metrics, pricing/promotions, inventory, catalog, order lifecycle).
- Alteryx (workflow development, optimization, scheduling, or migration to Databricks).
- Experience with Lakehouse patterns and Delta Lake features (e.g., MERGE, OPTIMIZE, Z-ORDER).
- Experience with orchestration tools (e.g., Airflow, ADF, Databricks Workflows).
- Cloud experience: AWS / Azure / GCP (S3/ADLS/GCS, IAM basics, security controls).
- CI/CD exposure for data pipelines, code reviews, and automated deployments.
Preferred Traits
- Strong problem-solving skills and a mindset for root-cause analysis.
- Ownership and accountability for production-grade pipelines.
- Ability to communicate with both technical and non-technical stakeholders.
- Comfort working in fast-paced environments with evolving requirements.