- Proven experience designing, implementing, and managing cloud solutions on major cloud platforms (e.g., AWS, Azure, GCP).
- Strong understanding of cloud computing concepts, architectures, and services (IaaS, PaaS, SaaS).
- Hands-on experience with cloud automation and infrastructure-as-code tools (e.g., Terraform, CloudFormation, ARM).
- Experience with cloud security best practices and tools.
- Prompt Engineering Environments: Designing and implementing infrastructure to support prompt engineering workflows and experimentation.
- Agent orchestration & tool integration (e.g., LangChain).
- Infrastructure as Code (IaC): Terraform (expert), CloudFormation, Google Deployment Manager, Bicep.
- Containerization & Orchestration: Docker, Kubernetes (EKS, GKE, AKS).
- MLOps/Gen AIOps: CI/CD pipelines for AI models/agents, model versioning, monitoring.
- Programming/Scripting: Python (strong).
As a Cloud AI Infra Architect you should have with a minimum of 10 years of experience in managing Cloud Enterprise infrastructure projects and driving automation through Gen AI, drive the adoption, optimization of our cloud infrastructure and services.
- Design, implement, and evolve highly available, scalable, and secure multi-cloud architectures specifically tailored for large language models (LLMs), foundation models, vector databases, prompt engineering environments, fine-tuning, and real-time inference for Gen AI.
- Develop infrastructure patterns and frameworks to support the deployment, orchestration, and management of autonomous AI agents, including their interaction with external tools, data sources, and reasoning engines.
- Drive the adoption and implementation of advanced IaC to automate the provisioning, configuration, and governance of all AI infrastructure.
- Proactively identify bottlenecks and implement innovative strategies for optimizing the performance, cost-efficiency, and resource utilization of high-compute AI workloads across all cloud providers.
- Define and enforce stringent security architectures, data governance policies, and compliance frameworks for sensitive AI data, models, and agent interactions (e.g., data privacy, responsible AI principles).
- Partner with Data Engineering to design and optimize data pipelines for large-scale, unstructured, and vector data required for Gen AI model training, fine-tuning, and retrieval-augmented generation
- Collaborate closely with Data Scientists and ML/Gen AI Engineers to design and implement robust MLOps/Gen AIOps pipelines for continuous integration, continuous delivery (CI/CD), continuous training (CT), and continuous evaluation (CE) of Gen AI models and agents.
- Good Communication skills
- Good analytical and problem-solving skills