- Design and implement production-grade AI/ML and Agentic AI solutions that drive end-to-end transformation across pricing, underwriting, and sales.
- Partner with Cloud, AIOps, Data Science, LOB IT, Enterprise Architecture, and Data teams to provision infrastructure, deploy services, and operate scalable AI platforms using modern DevOps practices.
- Leverage AI Platform, agent development standards, and agent frameworks to build, deploy, monitor and maintain agentic solutions & AI/ML pipelines.
- Architect and build highly available, scalable, secure, and fault-tolerant AI/ML systems, applying modern distributed system patterns such as event-driven, pub/sub, and point-to-point architectures.
- Design and implement agent memory, evaluation, and feedback mechanisms to enable quality, safety, and reliability-driven tuning and continuous improvement.
- Develop advanced context engineering, adaptive prompting, multi-agent coordination, and RAG/Agentic RAG systems using techniques such as HyDE, RAPTOR, and GraphRAG to improve accuracy and relevance.
- Write high-quality, production-ready Python (e.g., asyncio, FastAPI, Pydantic) and instrument AI observability using OpenTelemetry, offline evaluation, and drift monitoring, while leveraging enterprise AI platforms and standards.
Required Skills & Experience:
- Bachelor's or Master's degree in computer science, Software Engineering, Data Science, or a closely related discipline.
- Professional experience in ML, Software Engineering, or a related role, including 3+ years delivering AI/ML solutions in production.
- Strong Python development experience, with 3+ years of building and operating production services and APIs.
Generative AI & Agentic Systems
- Experience developing full-stack agentic solutions using agent frameworks such as ADK, A2A, MCP, LangChain, LangGraph, or CrewAI, and familiarity with commercial and open-source foundation models.
- Experience building and operating advanced RAG and Agentic RAG systems using modern techniques and methodologies.
- Experience with agentic monitoring, observability, and model evaluation frameworks to assess quality, safety, and performance in production.
ML, Platforms & Cloud
- Hands-on experience with ML and AI frameworks such as PyTorch, Hugging Face, Pandas, NumPy, and related libraries.
- Hands-on experience with at least one public cloud AI/GenAI platform (e.g., AWS SageMaker/Bedrock or Google Vertex AI, Vertex AI Search, and RAG Engine).
Software Engineering, DevOps & Security
- Experience designing and delivering production-grade APIs and microservices using modern software engineering practices.
- Hands-on experience with DevOps and CI/CD pipelines, infrastructure as code (e.g., Terraform), GitHub collaboration, and cloud deployments.
- Experience with DevSecOps tools such as Nexus, SonarQube, Checkmarx, and mcp-scan.
Ways of Working & Communication
- Experience working in lean, agile environments (e.g., SAFe or similar frameworks).
- Strong communication and collaboration skills, with the ability to explain complex technical concepts to technical and non-technical stakeholders, influence decisions, and work effectively across teams.