Infrastructure Ownership: Own Helpshift production services and ensure complete monitoring coverage, troubleshoot and fix production issues
Infrastructure as Code (IaC): Design and maintain scalable GCP infrastructure using Terraform o
AI Orchestration & LLMOps: Build deployment pipelines for AI agents, managing vector databases (e.g., Vertex AI Search, Pinecone, Weaviate, ElasticSearch) and model endpoints
Security (DevSecOps): Implement Security-by-Design, including IAM least-privilege access, secret management (Secret Manager), and automated vulnerability scanning for AI workloads
CI/CD Excellence: Architect high-velocity pipelines for both traditional microservices and AI model prompts/configurations. Design, implement, and maintain secure CI/CD pipelines for automating deployment, configuration, and testing processes
Observability: Set up comprehensive monitoring for system health and LLM-specific metrics (latency, token usage, and cost)
Cloud Governance: Optimise GCP costs and manage resource quotas, especially for GPU/TPU-intensive AI tasks
Cross Cloud Deployment: Establish & Optimise the connectivity among apps deployed in different cloud environments (AWS GCP)