GCP Data Engineer (68 Years Experience)
Role Overview
We are looking for a highly skilled Data Engineer with 68 years of experience to design, build, and optimize scalable data pipelines and AI-driven data solutions. The ideal candidate will have hands-on expertise in
Kubernetes (GKE),
GCP ecosystem,
RAG-based architectures, and
agentic/AI pipelines, along with experience in data processing, extraction, and integration workflows.
Key Responsibilities
- Design and develop scalable data pipelines for data extraction, transformation, and ingestion.
- Build and manage AI/ML data pipelines, including RAG (Retrieval-Augmented Generation) and agent-based workflows.
- Develop and orchestrate data extraction and injection pipelines across structured and unstructured data sources.
- Deploy and manage application jobs on Kubernetes (GKE) for high availability and performance.
- Integrate with GCP services such as BigQuery, Cloud Storage, Pub/Sub, Dataflow, and Apigee.
- Work with SDP (Sensitive Data Protection/DLP) and Model Armor for secure data handling and compliance.
- Collaborate with cross-functional teams to ensure data governance, security, and privacy compliance.
- Build and maintain CI/CD pipelines using GitHub and related DevOps tools.
- Optimize performance, scalability, and cost-efficiency of data and AI systems.
- Troubleshoot production issues and ensure system reliability.
Required Skills & Experience
Core Technical Skills
- Strong experience in Kubernetes and Google Kubernetes Engine (GKE)
- Hands-on experience with Google Cloud Platform (GCP) services
- Expertise in data pipeline development (batch & real-time)
- Experience with data extraction, transformation, and ingestion pipelines
- Solid understanding of RAG architectures and vector-based retrieval systems
- Experience in building agentic pipelines / AI-driven workflows
Platform & Tools
- GitHub (version control, CI/CD workflows)
- Apigee (API management and integration)
- SDP/DLP and Model Armor (data security and compliance handling)
- Experience with event-driven architectures (Pub/Sub, Eventarc preferred)
Programming & Data
- Strong programming skills in Python / Pyspark
- Experience working with SQL and NoSQL databases
- Familiarity with BigQuery, Databricks, or Spark-based processing
Good to Have
- Experience with Vertex AI / AI platform integration
- Knowledge of data masking, tokenization, and privacy frameworks
- Exposure to healthcare or sensitive data ecosystems (PII/PHI handling)
- Understanding of microservices architecture and API-driven systems
Data Management/Mining/Collection