Search by job, company or skills

minutes to seconds

Data Platform Architect

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 19 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role Overview

  • We are looking for a seasoned Data Platform Architect to own the design and delivery of three foundational pillars: a resilient multi-cloud data platform, an enterprise MCP (Model Context Protocol) Server layer that connects AI workloads to governed data assets, and a high-throughput message bus
  • Capable of sustaining millions of events per second across distributed consumers. This is a senior architecture role that combines deep hands-on engineering with cross-functional influence across data, AI, and infrastructure teams.

MUST HAVEs

  • Multi-Cloud Enablement
  • MCP Server Foundation
  • High-Throughput Message Bus

Key ResponsibilitiesMCP Server FoundationHigh-Throughput Message Bus ArchitecturePlatform Governance & Engineering Excellence

  • Multi-Cloud Data Platform Enablement
  • Architect a cloud-agnostic data platform that operates seamlessly across AWS, Azure, and GCP, with unified identity, governance, and cost controls
  • Define the reference architecture for lakehouse deployments (Delta Lake / Iceberg / Hudi) on each cloud, ensuring format interoperability and zero-lock-in data portability
  • Design cross-cloud data movement patterns including replication, federation, and active-active topologies using tools such as Debezium, Airbyte, and cloud-native transfer services
  • Establish a cloud-agnostic Unity Catalog or open metadata layer for consistent lineage, access control, and discoverability across all cloud zones
  • Drive FinOps practices: right-sizing compute, storage tiering, and reserved capacity planning across cloud providers
  • Architect and build the enterprise MCP Server layer that exposes governed data assets, query interfaces, and tool APIs to LLM-driven agents and copilots
  • Define the MCP resource taxonomy: which data assets surface as Resources, which operations become Tools, and which contextual feeds become Prompts
  • Implement authentication and authorization at the MCP boundary, ensuring AI agents operate within row-level, column-level, and dataset-level access policies
  • Design the MCP Server for multi-tenancy, supporting concurrent agent workloads with rate limiting, audit logging, and observability hooks
  • Collaborate with AI/ML teams to validate that MCP-served context materially reduces hallucination. rates and improves retrieval grounding quality
  • Produce the MCP Server SDK integration guide for internal engineering teams building AI- powered applications
  • Design and own the enterprise message bus architecture targeting sustained throughput of 1M+ events/sec with sub-50ms end-to-end latency at P99
  • Evaluate and select the appropriate messaging backbone (Apache Kafka, Confluent Platform, Redpanda, AWS Kinesis, or Azure Event Hubs) based on workload profiles
  • Define partitioning strategies, topic compaction policies, retention tiers, and tiered storage configurations aligned to IoT telemetry, CDC, and operational event patterns
  • Architect Schema Registry governance including schema evolution contracts (Avro / Protobuf / JSON Schema) and compatibility enforcement pipelines
  • Design consumer group topologies for stream processing frameworks (Flink, Spark Structured Streaming, Delta Live Tables) and ensure back pressure and offset management are production-grade
  • Integrate the message bus with the multi-cloud lakehouse as a bronze ingestion layer, enforcing idempotency and exactly-once delivery guarantees
  • Define and enforce platform-wide standards: naming conventions, tagging taxonomy, SLA tiers, DR objectives, and run-book templates
  • Champion Infrastructure-as-Code practices across Terraform, Pulumi, or Bicep for all cloud resources and data platform components
  • Lead architecture review boards (ARBs) and own the technical decision log (ADRs) for all major platform choices
  • Mentor senior data engineers and serve as escalation point for platform-level production incidents

Requirements

Required Qualifications

  • 10+ years in data engineering with at least 3 years in platform or solutions architecture roles
  • Hands-on experience architecting production lakehouse platforms on two or more of: AWS (S3,
  • Glue, Athena, EMR, Kinesis), Azure (ADLS Gen2, Databricks, Event Hubs, Synapse), GCP (BigQuery, Dataflow, Pub/Sub)
  • Deep expertise in Apache Kafka or equivalent message bus: cluster sizing, partition leadership, consumer lag management, and MirrorMaker 2 / replication topologies
  • Strong command of open table formats: Delta Lake, Apache Iceberg, or Apache Hudi — including time travel, merge-on-read vs. copy-on-write trade-offs, and OPTIMIZE / VACUUM strategies
  • Proficiency in Python and PySpark for platform automation, ingestion framework development, and schema validation pipelines
  • Demonstrated experience with metadata management: Apache Atlas, Unity Catalog, DataHub, or equivalent open metadata solutions
  • Familiarity with MCP specification or equivalent AI tool-use protocols; experience building or integrating API layers consumed by LLM agents is a strong plus
  • Infrastructure-as-Code fluency (Terraform, Pulumi, or equivalent) and CI/CD pipeline design for data platform deployments
  • Strong written communication skills: ability to produce architecture decision records, RFP responses, and client-facing implementation guides

Preferred Qualifications

  • Experience with Delta Live Tables (DLT) in Databricks, including CDC pipeline design and Liquid Clustering optimization
  • Exposure to vector databases (Pinecone, Weaviate, pgvector) and RAG pipeline architecture for grounding LLMs in enterprise data
  • Familiarity with Redpanda or Confluent Cloud as managed Kafka alternatives and their cost/performance trade-offs at scale
  • Knowledge of data mesh operating models: domain ownership, data products, and federated governance
  • Experience in regulated industries (energy, manufacturing, IoT telemetry) where data quality, auditability, and retention policies are mission-critical
  • Cloud certifications: AWS Data Analytics Specialty, Azure Data Engineer Associate, GCP Professional Data Engineer, or Databricks Certified Data Engineer Professional
  • Prior consulting or multi-client engagement experience; comfort navigating multiple concurrent stakeholder environments

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147195369

Similar Jobs

Bengaluru, India

Skills:

Azure Data FactoryPysparkSqlSpark SQL performance tuningDelta LakeAzure Data Lake Storage Gen2Microsoft Azure DatabricksCI CDGit-based workflowsSpark architecture

Bengaluru, India

Skills:

bigtable JavaBigQueryPostgreSQLScalaSqlRedisSparkApache KafkaMongoDBDataFlowPythondatastreamAirflowDebeziumPub SubCDC toolsdbtGCSGCP stackSpanner

Bengaluru, India

Skills:

data engineering Analytics DeliveryGraphqlSparkRest ApisData ArchitectureSqlPythonEvent-driven design

Bengaluru, India

Skills:

Power BiData GovernanceDatabricksEncryptionRedshiftSqlPythonIAM access controlsAWS architectureAthenaGraph modeling

Bengaluru

Skills:

Azure Data Lake (ADLS)Azure DatabricksAzure Data FactoryAzure Event HubAzure IoT Hub