GCP Data Architect (Principal)

Infogain India

Mumbai, India

18-21 Years

Save

Posted 13 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Roles & Responsibilities

Core Skills

Qualification

18+ years in data/analytics engineering with 10+ years architecting solutions on public cloud; 5+ years hands-on with GCP.
Proven delivery of Medallion (Bronze/Silver/Gold) architectures on GCP with GCS + Dataproc + BigQuery at enterprise scale.
Expert in PySpark and Dataproc (job orchestration, autoscaling, cluster policies, tuning, troubleshooting).
Strong BigQuery expertise: storage/compute separation, partitioning, clustering, materialized views, BI Engine, slot management, RLS/CLS.
Hands-on experience with Vertex AI (Pipelines, Feature Store, training/serving, registry, monitoring) and ML Ops best practices.
Implemented Dataplex for centralized governance (catalog, policy tags) and IAM for least-privilege, plus data security/compliance controls.
Practical integration with Oracle and Teradata via JDBC; familiarity with CDC patterns and schema evolution.
CI/CD for data platforms (Cloud Build/GitHub Actions), orchestration (Cloud Composer/Airflow), and Infrastructure as Code (Terraform).
Deep understanding of data modeling, data quality, lineage, and observability for data systems.
Excellent communication, stakeholder management, and leadership across technical and business teams.
Google certifications: Professional Cloud Architect and/or Professional Data Engineer (highly preferred).
Experience modernizing SAS workloads and translating SAS macros/PROCs to PySpark/SQL on GCP.
Knowledge of streaming (Pub/Sub, Dataflow/Flink/Spark Structured Streaming) for near-real-time requirements.
Experience with VPC Service Controls, Private Service Connect, Organization Policy, Workload Identity Federation.
Familiarity with Delta/Iceberg/Hudi tables and open table formats on GCS; data sharing patterns (Analytics Hub).
Bachelor's/Master's in Computer Science, Engineering, Information Systems, or equivalent experience.

Job Description

Design and own the end-to-end analytics architecture on GCP, ensuring alignment with business, security, cost, and performance goals.
Implement a Medallion architecture:
Bronze (Raw) ingestion on GCS via JDBC from Oracle/Teradata.
Silver (Curated) transformations using PySpark on Dataproc.
Gold optimized in BigQuery for analytics and BI.
Define canonical data models, storage formats (Parquet/ORC/Delta/Iceberg), and partitioning/clustering strategies.
Lead migration from SAS to PySpark, establish coding standards, and optimize Spark jobs.
Build JDBC ingestion pipelines with CDC, robust retries, schema evolution, and orchestrate workflows via Cloud Composer/Airflow and CI/CD.
Architect BigQuery models, manage cost/performance, enforce SLAs, and integrate securely with BI tools using RLS/CLS.
Define ML Ops workflows on Vertex AI, including feature pipelines, automated training, deployment, and model monitoring for drift/bias.
Implement centralized governance via Dataplex (catalog, policy tags), IAM least privilege, VPC-SC, and data security/compliance controls.
Drive cost optimization, reliability/SRE practices, monitoring, DR/BCP, and FinOps governance.
Provide architectural leadership, mentor teams, set standards, and create roadmaps, ADRs, and executive-level communication.

Experience