Job Description: Senior Data Engineer (ETL & AI Architecture)
Experience: 68 Years
Location: Mumbai (Full-time from office)
Employment Type: Full Time
Reporting To: Lead Data Analytics & AI
Role Purpose
We are seeking a highly skilled Data Engineer who goes beyond pipeline execution to deliver robust data solutions and implementations. The role involves architecting and implementing efficient Silver and Gold data layers, optimizing compute costs through deep parameter tuning, enforcing data quality and governance, and building a semantic layer that enables meaningful and consistent enterprise data querying.
We value strong foundational data and engineering principles over tool-specific expertise. Candidates from Azure, AWS, or Google Cloud backgrounds are welcome, provided they possess a deep understanding of distributed computing and can optimize systems for performance, cost, reliability, and accuracy.
Key Responsibilities
1. Architecture & Data Modelling
- Design & Strategy: Collaborate with stakeholders to design, document, and implement data structures across Bronze, Silver, and Gold layers to ensure scalability and faster insights.
- Data Modelling: Develop extensible data models that decouple storage from compute for flexibility.
- AI Readiness: Build semantic layers (metadata, relationships, context, feature stores) to support Large Language Models (LLMs) and AI use cases.
2. Engineering, Performance Tuning & FinOps
- Data Engineering: Implement ETL/ELT pipelines aligned with defined architecture.
- Build scalable Silver aggregations and Gold metrics layers.
- Enforce security (RBAC/ABAC), row/column-level controls, and PII handling.
- Maintain data dictionaries, metadata, and lineage as part of delivery standards.
- Implement proactive data quality checks.
- Compute Optimization & Scalability:Optimize compute resources (memory, cores, partitions, executors) based on:
- Data volume (GB to TB scale)
- Transformation complexity
- Data movement and network I/O
- SLA requirements (batch vs real-time)
- Optimize read volumes and cost efficiency.
- Design scalable architectures with minimal manual intervention.
- BAU Management:Handle enhancements, bug fixes, and pipeline optimizations.
- Port pipelines and data when technology stacks evolve.
3. Operational Excellence
- Data Quality: Implement automated frameworks (e.g., Great Expectations, dbt tests) to ensure data integrity.
- Orchestration: Manage workflows and dependencies using tools like Airflow, Dagster, or ADF, including SLAs, retries, and alerting.
- DevOps & CI/CD: Apply best practices including version control (Git), automated testing, and deployment pipelines.
Skillset & Requirements
- 58 years of experience in Data Engineering / Analytics Engineering, with at least 2 years in architecture and solution design.
- Strong problem-solving ability with a practical, execution-focused mindset.
- Experience preparing data for AI/LLM use cases (Vector DBs, Knowledge Graphs, Semantic Layers).
- Expertise in data modelling (Star Schema, Snowflake) and modern data lake formats (Delta Lake, Iceberg, Hudi).
- Strong understanding of distributed computing (Spark, Hive, BigQuery), including DAGs, partitioning, and shuffling.
- Proven experience in performance tuning and troubleshooting large-scale systems.
- Programming proficiency in SQL, Python, Spark (Scala is a plus).
Preferred / Good to Have
- Experience with Generative AI architectures (RAG, Vector Databases).
- Exposure to semantic/metric layer tools (LookML, Transform, MetricFlow).
- Ability to prototype dashboards or analytics UI using modern AI tools.
Behavioral Attributes
- High ethical standards
- Strong ownership and accountability
- Problem-solving mindset
- First-principles thinking approach