Requirement Details
Primary Location:
Noida
Position Overview (Job Summary):
A senior Data Platform Architect role responsible for defining, designing, and governing enterprise-wide AI-first data platform architectures. The role focuses on blueprinting advanced data ecosystems (Lakehouse, Data Mesh, Data Fabric), building scalable cloud-native infrastructure, and enabling endtoend AI/ML workflows including feature stores, streaming systems, data governance, lineage, and real-time ML/AI operations.
This role acts as a strategic technical leader aligning business goals with scalable AI-driven data architectures.
Primary Skills:
- Data Platform Architecture (Enterprise-grade)
- Lakehouse / Data Mesh / Data Fabric design
- Cloud platforms (AWS, Azure, GCP)
- Big Data technologies (Spark, Databricks, Snowflake, ADF)
- Real-time data streaming (Kafka, Kinesis)
- Feature Stores & ML data infrastructure
- Data governance, lineage, and metadata management
- Security frameworks (RBAC, Zero Trust, GDPR/CCPA/HIPAA compliance)
- Vector Databases (Pinecone, PGVector, Oracle Vector DB)
- Knowledge Graph architectures
Secondary Skills:
- Lambda/Kappa Architecture
- Hub-and-Spoke data architecture
- MLOps integration (model registries, monitoring)
- Data quality frameworks
- Bottleneck analysis in low-latency, high-volume AI systems
- Strong stakeholder communication & cross-functional alignment
Experience:
- 1016 years in Data Warehouse / Big Data / Data Platform engineering
- 35+ years in AI/ML infrastructure architecture
- Band: 4.2 / 5.1
Role and Responsibilities
A. Key Responsibilities
1. Architectural Blueprinting
- Design scalable, secure data platform architectures (Lakehouse, Mesh, Fabric).
- Create cloud-native data storage and processing frameworks for AI-scale workloads.
- Develop architecture supporting massive data volumes for model training and inference.
2. AI Data Infrastructure Design
- Architect feature stores, streaming systems, and automated ML pipelines.
- Build real-time data ingestion and AI-ready serving pipelines using Spark/Kafka.
3. Data Lifecycle Management
- Govern end-to-end lifecycle: acquisition cleaning preprocessing serving.
- Automate data pipelines for ingestion, transformation, feature engineering.
- Architect systems for streaming data (Kafka/Kinesis) enabling real-time ML use cases.
- Implement metadata management, data lineage, and quality monitoring systems.
4. Governance & Ethics
- Define unified governance frameworks ensuring data privacy, compliance, and security.
- Implement controls to mitigate algorithmic bias in AI training datasets.
5. Security & Compliance
- Embed Zero Trust, RBAC, encryption, and regulatory compliance into system design.
- Ensure architecture adheres to standards like GDPR, HIPAA, CCPA.
6. Stakeholder Collaboration
- Serve as a technical bridge between business leaders, data scientists, ML engineers, and IT teams.
- Translate business requirements into actionable technical designs.
7. MLOps Collaboration
- Integrate feature stores, model registries, and monitoring tools with AI/ML workflows.
- Enable continuous retraining and automated deployment pipelines.
B. Additional Responsibilities
- Identify architectural bottlenecks and optimize for high-volume, low-latency AI workloads.
- Drive architectural best practices across data engineering and ML engineering teams.
- Guide cloud modernization and digital transformation initiatives.
- Provide architectural governance across data products and platform teams.
Educational Qualification:
- Bachelor's or Master's degree in:
- Computer Science
- Information Systems
- Engineering
- Or related technical field
Certifications (Preferred but not mandatory):
- Cloud Architect Certifications (AWS / Azure / GCP)
- Databricks Certified Data Engineer / Architect
- Snowflake Architect Certification
- TOGAF / Zachman (Optional for architecture governance)
- Certifications in AI/ML, MLOps, or Data Governance (nice to have)