Design, build, and operate highly scalable data pipelines on GCP as the backbone of a Knowledge Platform.
Develop robust batch and streaming pipelines using Dataflow (Apache Beam), Pub/Sub, Cloud Composer, Spark, ensuring reliability and throughput.
Build and own RAG data pipelines: source ingestion, normalization, document chunking, embedding generation, indexing, and refresh strategies.
Implement and manage Vector Databases (e.g., Vertex AI Vector Search, Pinecone, Weaviate, FAISS) with strong focus on performance and lifecycle management.
Design and maintain Knowledge Graph pipelines for entity extraction, normalization, relationship modeling, and incremental graph updates.
Model and operate NoSQL and graph data stores (Bigtable, Firestore, Cassandra, MongoDB, Graph DBs) for low-latency and large-scale access.
Enforce data quality, lineage, and observability across pipelines, including validation, monitoring, and backfill strategies.
Collaborate closely with ML/LLM and Product teams to productionize knowledge-driven use cases without compromising data engineering rigor.
Apply cloud-native best practices on GCP for security, IAM, cost optimization, CI/CD, and Infrastructure as Code (Terraform).
Strong data engineering experience with Python/Java, and hands-on exposure to knowledge platforms, RAG, or graph-based systems