Senior Data Engineer (ETL & AI Architecture)

nexgen tech solutions

Mumbai, India

6-8 Years

Save

Posted 3 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Description: Senior Data Engineer (ETL & AI Architecture)

Experience: 68 Years

Location: Mumbai (Full-time from office)

Employment Type: Full Time

Reporting To: Lead Data Analytics & AI

Role Purpose

We are seeking a highly skilled Data Engineer who goes beyond pipeline execution to deliver robust data solutions and implementations. The role involves architecting and implementing efficient Silver and Gold data layers, optimizing compute costs through deep parameter tuning, enforcing data quality and governance, and building a semantic layer that enables meaningful and consistent enterprise data querying.

We value strong foundational data and engineering principles over tool-specific expertise. Candidates from Azure, AWS, or Google Cloud backgrounds are welcome, provided they possess a deep understanding of distributed computing and can optimize systems for performance, cost, reliability, and accuracy.

Key Responsibilities

1. Architecture & Data Modelling

Design & Strategy: Collaborate with stakeholders to design, document, and implement data structures across Bronze, Silver, and Gold layers to ensure scalability and faster insights.
Data Modelling: Develop extensible data models that decouple storage from compute for flexibility.
AI Readiness: Build semantic layers (metadata, relationships, context, feature stores) to support Large Language Models (LLMs) and AI use cases.

2. Engineering, Performance Tuning & FinOps

Data Engineering: Implement ETL/ELT pipelines aligned with defined architecture.
Build scalable Silver aggregations and Gold metrics layers.
Enforce security (RBAC/ABAC), row/column-level controls, and PII handling.
Maintain data dictionaries, metadata, and lineage as part of delivery standards.
Implement proactive data quality checks.
Compute Optimization & Scalability:Optimize compute resources (memory, cores, partitions, executors) based on:
Data volume (GB to TB scale)
Transformation complexity
Data movement and network I/O
SLA requirements (batch vs real-time)
Optimize read volumes and cost efficiency.
Design scalable architectures with minimal manual intervention.
BAU Management:Handle enhancements, bug fixes, and pipeline optimizations.
Port pipelines and data when technology stacks evolve.

3. Operational Excellence

Data Quality: Implement automated frameworks (e.g., Great Expectations, dbt tests) to ensure data integrity.
Orchestration: Manage workflows and dependencies using tools like Airflow, Dagster, or ADF, including SLAs, retries, and alerting.
DevOps & CI/CD: Apply best practices including version control (Git), automated testing, and deployment pipelines.

Skillset & Requirements

58 years of experience in Data Engineering / Analytics Engineering, with at least 2 years in architecture and solution design.
Strong problem-solving ability with a practical, execution-focused mindset.
Experience preparing data for AI/LLM use cases (Vector DBs, Knowledge Graphs, Semantic Layers).
Expertise in data modelling (Star Schema, Snowflake) and modern data lake formats (Delta Lake, Iceberg, Hudi).
Strong understanding of distributed computing (Spark, Hive, BigQuery), including DAGs, partitioning, and shuffling.
Proven experience in performance tuning and troubleshooting large-scale systems.
Programming proficiency in SQL, Python, Spark (Scala is a plus).

Preferred / Good to Have