GCP Data Engineer

zorba ai

Pune, India

8-10 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

Key Required Skills

Programming & Data Processing: Advanced SQL, Python; Scala/Java for Spark/Flink (Go is a plus)
Cloud Data Platforms: Hands-on with BigQuery, Snowflake, Redshift, Synapse/Databricks SQL; strong DW vs MPP understanding
Data Modelling: Dimensional modelling, Data Vault 2.0, SCDs, schema evolution
Streaming: Kafka/Pub/Sub/Kinesis, Spark Streaming/Flink; schema management and processing reliability
Orchestration & ELT: Airflow/Composer, dbt or similar tools
CI/CD & Platform Engineering: Git workflows, automated pipelines, Terraform/CloudFormation, Docker/Kubernetes
Data Quality & Governance: Data contracts, testing frameworks, lineage/catalog tools
BI & Semantics: KPI/metric modelling, semantic layers, enterprise BI exposure
AI Readiness: Feature engineering, ML/GenAI data patterns, knowledge layers
Security & Compliance: IAM, encryption, masking/tokenization, auditability

Key Responsibilities

Build reusable pipeline frameworks (batch & streaming) with standard templates
Design analytics-ready data models (star/snowflake, Data Vault 2.0)
Optimize cloud data warehouse performance and cost
Develop robust streaming pipelines with SLA-driven delivery
Implement data quality frameworks and governance controls
Enable metadata-driven engineering and lineage tracking
Establish semantic layers for BI and self-service analytics
Prepare AI-ready data foundations (feature datasets, knowledge models)
Ensure observability, monitoring, and FinOps optimization
Drive engineering excellence through CI/CD, IaC, and best practices

Ideal Candidate Profile – Must Have Skill SetMandatory Technical Skills

Advanced SQL and strong Python programming
Hands-on experience with Spark using Scala or Java
Strong expertise in at least one cloud data warehouse/platform:

Snowflake
BigQuery
Redshift
Synapse
Databricks SQL

Strong understanding of Data Warehousing and MPP architecture

Mandatory Data Engineering Experience

Dimensional Modelling (Star/Snowflake Schema)
Data Vault 2.0
Slowly Changing Dimensions (SCDs)
Batch and Streaming pipeline development

Mandatory Streaming Skills

Kafka / Pub-Sub / Kinesis
Spark Streaming or Apache Flink
Real-time data processing and schema management

Mandatory Orchestration & ELT Skills

Airflow / Cloud Composer
dbt or equivalent ELT framework

Mandatory DevOps & Platform Skills

Git-based CI/CD workflows
Terraform or CloudFormation
Docker & Kubernetes

Mandatory Governance & Quality Skills

Data quality frameworks and testing
Metadata, lineage, and governance implementation
Security concepts:

IAM
Encryption
Masking/tokenization

Mandatory BI & Analytics Exposure

Semantic layer and KPI/metric modelling
Enterprise BI and self-service analytics exposure

AI/ML Readiness (Must Have)

Feature engineering concepts
AI/ML or GenAI data preparation exposure
Knowledge layer/data foundation understanding

Ideal Experience Range

8+ years overall experience in Data Engineering
Strong experience in enterprise cloud data platform implementations
Experience building scalable, reusable data framework

Ideal Experience Range

8+ years overall experience in Data Engineering
Strong experience in enterprise cloud data platform implementations
Experience building scalable, reusable data frameworks
Key Required Skills

Programming & Data Processing: Advanced SQL, Python; Scala/Java for Spark/Flink (Go is a plus)
Cloud Data Platforms: Hands-on with BigQuery, Snowflake, Redshift, Synapse/Databricks SQL; strong DW vs MPP understanding
Data Modelling: Dimensional modelling, Data Vault 2.0, SCDs, schema evolution
Streaming: Kafka/Pub/Sub/Kinesis, Spark Streaming/Flink; schema management and processing reliability
Orchestration & ELT: Airflow/Composer, dbt or similar tools
CI/CD & Platform Engineering: Git workflows, automated pipelines, Terraform/CloudFormation, Docker/Kubernetes
Data Quality & Governance: Data contracts, testing frameworks, lineage/catalog tools
BI & Semantics: KPI/metric modelling, semantic layers, enterprise BI exposure
AI Readiness: Feature engineering, ML/GenAI data patterns, knowledge layers
Security & Compliance: IAM, encryption, masking/tokenization, auditability

Key Responsibilities

Build reusable pipeline frameworks (batch & streaming) with standard templates
Design analytics-ready data models (star/snowflake, Data Vault 2.0)
Optimize cloud data warehouse performance and cost
Develop robust streaming pipelines with SLA-driven delivery
Implement data quality frameworks and governance controls
Enable metadata-driven engineering and lineage tracking
Establish semantic layers for BI and self-service analytics
Prepare AI-ready data foundations (feature datasets, knowledge models)
Ensure observability, monitoring, and FinOps optimization
Drive engineering excellence through CI/CD, IaC, and best practices

Ideal Candidate Profile – Must Have Skill SetMandatory Technical Skills

Advanced SQL and strong Python programming
Hands-on experience with Spark using Scala or Java
Strong expertise in at least one cloud data warehouse/platform:

Snowflake
BigQuery
Redshift
Synapse
Databricks SQL

Strong understanding of Data Warehousing and MPP architecture

Mandatory Data Engineering Experience

Dimensional Modelling (Star/Snowflake Schema)
Data Vault 2.0
Slowly Changing Dimensions (SCDs)
Batch and Streaming pipeline development

Mandatory Streaming Skills

Kafka / Pub-Sub / Kinesis
Spark Streaming or Apache Flink
Real-time data processing and schema management

Mandatory Orchestration & ELT Skills

Airflow / Cloud Composer
dbt or equivalent ELT framework

Mandatory DevOps & Platform Skills

Git-based CI/CD workflows
Terraform or CloudFormation
Docker & Kubernetes

Mandatory Governance & Quality Skills

Data quality frameworks and testing
Metadata, lineage, and governance implementation
Security concepts:

IAM
Encryption
Masking/tokenization

Mandatory BI & Analytics Exposure

Semantic layer and KPI/metric modelling
Enterprise BI and self-service analytics exposure

AI/ML Readiness (Must Have)

Feature engineering concepts
AI/ML or GenAI data preparation exposure
Knowledge layer/data foundation understanding

Ideal Experience Range

8+ years overall experience in Data Engineering
Strong experience in enterprise cloud data platform implementations
Experience building scalable, reusable data frameworks

Key Required Skills
Programming & Data Processing: Advanced SQL, Python; Scala/Java for Spark/Flink (Go is a plus)
Cloud Data Platforms: Hands-on with BigQuery, Snowflake, Redshift, Synapse/Databricks SQL; strong DW vs MPP understanding
Data Modelling: Dimensional modelling, Data Vault 2.0, SCDs, schema evolution
Streaming: Kafka/Pub/Sub/Kinesis, Spark Streaming/Flink; schema management and processing reliability
Orchestration & ELT: Airflow/Composer, dbt or similar tools
CI/CD & Platform Engineering: Git workflows, automated pipelines, Terraform/CloudFormation, Docker/Kubernetes
Data Quality & Governance: Data contracts, testing frameworks, lineage/catalog tools
BI & Semantics: KPI/metric modelling, semantic layers, enterprise BI exposure
AI Readiness: Feature engineering, ML/GenAI data patterns, knowledge layers
Security & Compliance: IAM, encryption, masking/tokenization, auditability

Key Responsibilities