Search by job, company or skills

Prodapt

Vertex AIOps Platform Engineer (GCP)

new job description bg glownew job description bg glownew job description bg svg
  • Posted 10 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Overview

The Lead AIOps Engineer is responsible for architecting, provisioning, and operationalizing multi-environment AI platforms on Google Cloud (Sandbox, Dev, Prod). The role includes cloud environment setup, IAM governance, CI/CD pipeline development, AIOps automation, drift detection, lifecycle process design, documentation, and alignment with broader enterprise platforms. This is a hands-on technical leadership position.

Responsibilities

Environment Provisioning

  • Conduct workshops to gather GCP environment requirements.
  • Design cloud architecture including VPC, IAM, subnetting, quotas, endpoints, and security controls.
  • Lead the provisioning of Sandbox, Dev, and Prod GCP projects using Terraform.
  • Oversee API enablement, configuration, and validation testing.

Role Definitions & IAM Governance

  • Define IAM roles for AI platform users (Owner, Support, ML Engineer, Viewer).
  • Create IAM matrices, RACI charts, and detailed access control documentation.
  • Ensure least-privilege access policies across Vertex AI and GCP services.
  • Coordinate reviews and approvals with security and architecture teams.

AIOps Framework Development

  • Design and implement drift detection, anomaly monitoring, canary releases, automated rollback, and observability components.
  • Build reusable CI/CD pipelines using Vertex Pipelines and Cloud Build.
  • Develop SOPs, diagrams, runbooks, and the full AIOps operations playbook.
  • Execute and validate synthetic drift, monitoring, and pipeline test scenarios.

Lifecycle Processes

  • Define the complete ML lifecycle from environment setup through deployment, monitoring, retraining triggers, and retirement.
  • Integrate lifecycle processes within CI/CD and AIOps automation.
  • Document all lifecycle flows in Confluence and conduct validation sessions.

Resource Planning & Cost Modelling

  • Develop team structure, roles, and support plans.
  • Build cost and usage models using GCP calculators and automation scripts.
  • Prepare development and production usage forecasts and long-term TCO estimates.

Alignment Analysis

  • Assess synergy with existing enterprise initiatives (Data Lake, Billing, Cloud Migration, Security).
  • Document dependencies, risks, and overlapping components.
  • Produce final recommendations and alignment reports.

Requirements


Core Technical Skills

  • Strong expertise in Google Cloud Platform: Vertex AI, IAM, VPC, Cloud Build, Cloud Run, Cloud Functions, Pub/Sub.
  • Deep experience with Terraform and Infrastructure as Code workflows.
  • Practical experience with AIOps and MLOps frameworks.
  • Proficient in Python for automation and monitoring jobs.
  • Experience designing and operating CI/CD pipelines for ML workloads.
  • Knowledge of observability tools such as Cloud Monitoring, Logging, and OpenTelemetry.

Soft Skills

  • Strong client-facing and stakeholder engagement abilities.
  • Experience leading engineering teams and driving architectural decisions.
  • Excellent documentation and presentation skills.
  • Ability to guide cross-functional teams through complex technical implementations.

Preferred Qualifications


  • GCP Professional ML Engineer or Cloud Architect certification.
  • Experience with Looker or other operational dashboards.
  • Background in ML engineering or SRE.












More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 143987353