Data Engineering Architect
Experience: 8-12+ years
Location: Kolkata/Bangalore
Work Mode: Hybrid
Position Summary :
Own enterprise data architecture across OLTP, analytics and Gen AI Agentic AI workloads cloud on-prem. Define lakehouse warehouse target state standards; enable secure, scalable, cost efficient RAG agentic solutions. Partner with data engineering AI teams to deliver governed, retrieve already datasets metadata. Preferred: Primary experience on Azure Databricks + Synapse Platform + Purview OR AWS Snow flake Redshift + Glue with exposure to the other; ServiceNow CMDB/ITSM/ITOM as a governed knowledge source is a plus.
Key Responsibilities:
- Own roadmap for lakehouse, warehouse, MDM and streaming layers. Architect on Azure OR AWS Databricks Synapse or Snowflake Redshift with ADF Glue Airflow orchestration; design streaming with Kafka OR Kinesis and CDC.
- Model data dimensional, Data Vault 2.0, DDD; define canonical semantic layers. Build ELTETL SQL, Python, PySpark; batchstream pipelines Kafka, Kinesis, CDC, schema registry.
- Design vectorization pipelines chunking, embeddings operate vector stores Azure AI Search, Pinecone, FAISS.
- Data quality metadata lineage: Great Expectations Deequ; Purview AI Collaboration; Open Lineage.
- Security privacy FinOps: IAMRBACABAC, encryption, PIIGDPRISO; SLAs, cost performance monitoring.
- Integrate ServiceNow via Table REST APIs; align with CSDM; stream events to agents via Kafka.
Must-have Skills
- Expert SQL; strong Python PySpark;
- ELTETL orchestration Airflow ADF Glue.
- Lakehouse warehouse: Databricks Delta; Snowflake Redshift Big Query; modeling Kimball, Data Vault.
- Streaming integration: Kafka OR Kinesis; CDC; Avro Proto buf.
- Cloud Azure OR AWS, security IAMTLSKMS, DR performance; governance Purview CollibraAlation, Great Expectations Deequ, PII controls.
- Good to Have Agentic RAG architectures Lang Chain, Lang Graph with vector stores Azure AI Search Pinecone FAISS and unification of ServiceNow knowledge with lakehouse.
- Exposure to ServiceNow data integration Table REST APIs, CSDMCMDB is a plus.
- Cost performance for GenAI data flows: Delta Iceberg Hudi tuning; Databricks Snowflake FinOps; real-time features via Kafka.
- Education BE / BTech/ ME/ MTech in CSIT Data or related Preferred
- Certifications: Azure AWS Data Engineer Architect, Databricks, Snowflake.