
Search by job, company or skills
Principal Data Engineer:
Experience: 9+ Years
Work Mode: Onsite
Location: Bangalore
Principal Data Platform Engineer
Architecture: Lakehouse (Medallion: Bronze/Silver/Gold)
Compute: Apache Spark (Expert level)
Storage/Table Format: Delta Lake (Required), Iceberg (Strong Plus)
Transformation: dbt (Expert level)
Orchestration: Airflow, Cosmos
Infrastructure: Cloud-native (GCP preferred) + Databricks/Commercial tooling
Patterns: Microservices, Event-driven, CI/CD, IaC (Terraform)
Shape
Core Technical Requirements
1. Data Engineering & Spark Internals
Deep Spark: You must understand RDDs, DataFrames, Spark SQL, and internals (Shuffle,
Partitioning, Memory Management, Catalyst Optimizer).
Pipeline Mastery: Building idempotent, self-healing ELT/ETL pipelines. Experience with Schema
Evolution and handling late-arriving data.
Lakehouse ACID: Expert knowledge of transaction logs, time travel, and file compaction in
Delta/Iceberg.
2. Software Architecture & Design
Engineering First: This isn't just SQL and scripts. You apply SOLID principles, design patterns,
and write production-grade Python/Scala/Java.
Integration: Experience building and consuming Microservices. Knowledge of API design
(REST/gRPC) and message brokers (Kafka/PubSub).
System Design: Experience building a platform from scratch. You know how to design for 99.9%
availability and horizontal scalability.
3. Data Modeling & dbt
Modeling: Expert in dimensional modeling (Kimball), Data Vault 2.0, or OBT (One Big Table) for
high-performance analytics.
dbt Power User: Advanced dbt usage (Macros, Packages, Custom Tests, dbt Mesh). You treat
dbt projects like software repositories (version control, PR reviews, CI).
4. Cloud & Platform
Cloud Native: Deep understanding of IAM, VPCs, Object Storage, and serverless compute.
Migrations: Proven track record of moving petabyte-scale data from legacy systems (On-prem,
Redshift, Snowflake) to a Lakehouse without data loss.
Shape
Key Deliverables (First 6-12 Months)
Platform Zero: Evaluate, select, and deploy the foundational Lakehouse infrastructure.
Core Frameworks: Build the reusable libraries/templates for the rest of the engineering team to
build pipelines.
Legacy Decommission: Design the technical map to migrate all high-priority finance/business
data to the new stack.
Performance Baseline: Optimize Spark/Cloud costs by at least 20% through better resource
management.
Shape
The Plus List
MLOps: Building feature stores and model deployment triggers.
GCP Specialization: BigQuery (as a Lakehouse layer), Dataproc, and Cloud Composer.
Observability: Implementing Data Quality monitoring (Great Expectations, Monte Carlo) and
OpenTelemetry.
Job ID: 146457021