Project Description:
The role is for a Senior Analyst Engineer specializing in Databricks development and cloud-based data engineering. The position focuses on designing, building, optimizing, and managing large-scale data pipelines and advanced analytics solutions using Databricks on cloud platforms such as Azure, AWS, or GCP.
The engineer will work closely with data platform, analytics, and application teams to ensure scalable, high‑performance, secure, and cost‑efficient data solutions aligned with organizational architecture and governance standards.
Responsibilities:
- Design and develop ETL/ELT pipelines in Databricks using PySpark, Spark SQL, and Delta Lake.
- Build and maintain high-performance data ingestion and transformation workflows.
- Implement Delta Lake best practices, including ACID transactions, schema evolution, time travel, and optimized storage formats.
- Develop and operationalize data pipelines, jobs, and workflows using Databricks Workflows or cloud-native orchestration tools.
- Optimize Spark jobs for performance, scalability, and cost-efficiency.
- Manage and monitor Databricks clusters, including autoscaling, cluster policies, and job cluster configurations.
- Implement CI/CD for Databricks code using tools like GitHub Actions, Azure DevOps, or Jenkins.
- Ensure data security, governance, and compliance, including access controls, encryption, and Unity Catalog.
- Collaborate with data architects, analysts, and business stakeholders to deliver high-quality data solutions.
- Create and maintain technical documentation, adhere to change management processes, and contribute to automation and continuous improvement initiatives.
Mandatory Skills Description:
- 8 to 10 years of experience in data engineering, with at least 3-5 years hands-on experience in Databricks.
- Strong expertise in PySpark, Spark SQL, and distributed data processing fundamentals.
- Proven experience building end-to-end ETL/ELT pipelines on cloud platforms.
- Hands-on experience with Delta Lake, optimized file formats (Parquet), and data lake architectures.
- Strong understanding of cloud ecosystems (Azure, AWS, or GCP), including storage, networking, security, and compute services.
- Experience designing and tuning high-performance Spark jobs, optimizing partitions, caching, and cluster usage.
- Proficiency in Git, CI/CD pipelines, and version-controlled development workflows.
- Solid understanding of data security, access management, and governance frameworks.
- Strong communication skills and ability to work with cross-functional teams.
Nice-to-Have Skills Description:
- Experience with Azure Databricks specifically (preferred), including integration with ADF, ADLS, and Azure DevOps.
- Knowledge of MLflow, feature engineering, or machine learning workflows.
- Exposure to automation frameworks, DevOps practices, and infrastructure-as-code (Terraform, ARM, CloudFormation).
- Familiarity with Unity Catalog, Lakehouse architecture, and data governance concepts.
- Experience working within structured change, incident, and release management frameworks.