Search by job, company or skills

Infogain India

GCP Data Architect (Principal)

18-21 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 13 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Roles & Responsibilities

Core Skills

Qualification

  • 18+ years in data/analytics engineering with 10+ years architecting solutions on public cloud; 5+ years hands-on with GCP.
  • Proven delivery of Medallion (Bronze/Silver/Gold) architectures on GCP with GCS + Dataproc + BigQuery at enterprise scale.
  • Expert in PySpark and Dataproc (job orchestration, autoscaling, cluster policies, tuning, troubleshooting).
  • Strong BigQuery expertise: storage/compute separation, partitioning, clustering, materialized views, BI Engine, slot management, RLS/CLS.
  • Hands-on experience with Vertex AI (Pipelines, Feature Store, training/serving, registry, monitoring) and ML Ops best practices.
  • Implemented Dataplex for centralized governance (catalog, policy tags) and IAM for least-privilege, plus data security/compliance controls.
  • Practical integration with Oracle and Teradata via JDBC; familiarity with CDC patterns and schema evolution.
  • CI/CD for data platforms (Cloud Build/GitHub Actions), orchestration (Cloud Composer/Airflow), and Infrastructure as Code (Terraform).
  • Deep understanding of data modeling, data quality, lineage, and observability for data systems.
  • Excellent communication, stakeholder management, and leadership across technical and business teams.
  • Google certifications: Professional Cloud Architect and/or Professional Data Engineer (highly preferred).
  • Experience modernizing SAS workloads and translating SAS macros/PROCs to PySpark/SQL on GCP.
  • Knowledge of streaming (Pub/Sub, Dataflow/Flink/Spark Structured Streaming) for near-real-time requirements.
  • Experience with VPC Service Controls, Private Service Connect, Organization Policy, Workload Identity Federation.
  • Familiarity with Delta/Iceberg/Hudi tables and open table formats on GCS; data sharing patterns (Analytics Hub).
  • Bachelor's/Master's in Computer Science, Engineering, Information Systems, or equivalent experience.

Job Description

  • Design and own the end-to-end analytics architecture on GCP, ensuring alignment with business, security, cost, and performance goals.
  • Implement a Medallion architecture:
  • Bronze (Raw) ingestion on GCS via JDBC from Oracle/Teradata.
  • Silver (Curated) transformations using PySpark on Dataproc.
  • Gold optimized in BigQuery for analytics and BI.
  • Define canonical data models, storage formats (Parquet/ORC/Delta/Iceberg), and partitioning/clustering strategies.
  • Lead migration from SAS to PySpark, establish coding standards, and optimize Spark jobs.
  • Build JDBC ingestion pipelines with CDC, robust retries, schema evolution, and orchestrate workflows via Cloud Composer/Airflow and CI/CD.
  • Architect BigQuery models, manage cost/performance, enforce SLAs, and integrate securely with BI tools using RLS/CLS.
  • Define ML Ops workflows on Vertex AI, including feature pipelines, automated training, deployment, and model monitoring for drift/bias.
  • Implement centralized governance via Dataplex (catalog, policy tags), IAM least privilege, VPC-SC, and data security/compliance controls.
  • Drive cost optimization, reliability/SRE practices, monitoring, DR/BCP, and FinOps governance.
  • Provide architectural leadership, mentor teams, set standards, and create roadmaps, ADRs, and executive-level communication.

Experience

  • 18-21 Years

Skills

  • Primary Skill: Data Engineering
  • Sub Skill(s): Data Engineering
  • Additional Skill(s): Big Data, GCP-Apps, Pyspark, BigQuery

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 145353745