ETL Architect

Saarthee

Bengaluru, India

6-8 Years

Save

Posted 6 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Position: Data Engineer (ETL & AI Architecture) Location: Bangalore

Work Mode: Hybrid

Min-Max Experience: 6-8 Years

Position Summary:

We are looking for a Data Engineer who moves beyond pipeline execution to true data solutioning and implementation. You will be responsible for architecting and implementing efficient Silver and Gold data layers, optimizing compute costs through deep-dive parameter tuning, enforcing data quality and governance, and orchestrating and building the Semantic Layer to understand and query enterprise data meaningfully and consistently.
We value fundamental data and engineering principles over syntax memorization. Whether your background is in Azure, Google Cloud, or AWS, we are looking for someone who understands how distributed computing works under the hood and can fine-tune it for speed, cost, reliability, and accuracy.

Your Role Responsibilities and Duties:

Data Architecture & Engineering

Design and implement scalable ETL/ELT pipelines and distributed data processing systems
Build and manage Bronze, Silver, and Gold data layers for analytics and AI consumption
Architect extensible dimensional data models using Star Schema and Snowflake methodologies
Work with modern lakehouse table formats such as Delta Lake, Iceberg, or Hudi
Build scalable and reliable data platforms capable of handling large-scale structured and unstructured datasets
Design systems with minimal manual intervention and high scalability across multiple business use cases
Develop reusable frameworks, metadata-driven pipelines, and semantic data layers
AI & Modern Data Systems
Build AI/LLM-ready data architectures for enterprise use cases
Prepare and structure datasets for Retrieval-Augmented Generation (RAG) architectures
Work with Vector Databases, Knowledge Graphs, and semantic layers supporting Generative AI applications
Integrate modern AI-driven workflows into enterprise data platforms
Collaborate with business and product teams to identify practical AI use cases that create business value
Support AI-enabled analytics, intelligent querying, and contextual data discovery
Performance Optimization & Scalability
Optimize Spark jobs, distributed workloads, and compute infrastructure for cost and performance
Tune memory, executors, partitions, shuffling, and serialization for large-scale workloads
Improve processing efficiency across batch and near real-time pipelines
Minimize network I/O and optimize read/write operations for high-volume datasets
Analyze and troubleshoot slow stages, spill-to-disk issues, and performance bottlenecks
Balance SLA requirements with infrastructure cost optimization

2. Governance & Operational Excellence

Implement data quality frameworks and automated validation checks
Enforce RBAC/ABAC, row-level security, column-level security, masking, and governance standards
Maintain metadata, lineage, and data dictionary standards across pipelines
Build orchestration workflows using tools like Airflow, Dagster, or ADF
Manage DAG dependencies, retries, backfills, and monitoring workflows
Apply CI/CD and DevOps best practices including Git, automated testing, and deployment pipelines
Support BAU activities, enhancements, optimization, and production issue resolution

Required Skills and Qualifications

Core Data Engineering Skills
Strong experience in Data Engineering / Analytics Engineering with at least 2 years in architecture or solutioning roles
Advanced proficiency in SQL, Python, Spark, and Spark SQL
Strong understanding of distributed computing principles and large-scale data processing
Experience with Spark, Hive, BigQuery, and cloud-native data ecosystems
Expertise in dimensional modeling (Star Schema, Snowflake Schema)
Hands-on experience building scalable ETL/ELT pipelines
Strong understanding of DAGs, partitioning, shuffling, and query optimization
Experience with cloud platforms like AWS, Azure, or GCP