Role Summary
Own end-to-end lifecycle of autonomous AI agents and internal AI accelerators driving multi-crore business impact through automation and productivity transformation.
Build and scale AI agent ecosystem from zero to enterprise production, delivering measurable ROI through agent automation across engineering workflows.
Key Responsibilities
AI Agent Platform Architecture
- Architect scalable, production-grade AI agent frameworks for enterprise deployment
- Design agent orchestration systems supporting complex multi-agent workflows
- Implement enterprise-grade monitoring, tracing, and performance observability
- Ensure 99.9% uptime SLA across all production agents
- Optimize for cost efficiency and performance at scale
Agent Development & Productionization
- Lead development of autonomous AI agents solving high-value business problems
- Implement advanced agent capabilities (tool calling, memory, reasoning, planning)
- Productionize agent deployments with robust error handling and recovery mechanisms
- Optimize inference costs and performance at enterprise scale (1B+ tokens/month)
- Establish production readiness standards and deployment practices
Internal AI Accelerators
- Create reusable AI tools and accelerators for domain experts
- Package complex AI capabilities as low-code/no-code solutions
- Drive platform adoption across large engineering user base (8,000+ users)
- Measure and demonstrate productivity impact and business value
- Build self-service AI capabilities for non-technical users
Enterprise Integration & MLOps
- Integrate AI platform with enterprise data lakehouse and analytics layer
- Implement comprehensive MLOps pipelines (CI/CD, model registry, versioning)
- Establish cost governance and optimization frameworks
- Ensure enterprise security, compliance, and data governance standards
- Implement monitoring dashboards for cost, performance, and availability
Platform Leadership & Strategy
- Define AI agent platform roadmap and technology strategy
- Mentor junior AI engineers and establish best practices
- Collaborate with cloud vendors and technology partners
- Present platform impact and ROI to executive leadership
- Drive continuous optimization and innovation
Required Technical Expertise
Must Have
✅ 3+ years production AI agent frameworks
(Mosaic AI, LangChain, crewAI, AutoGen, or equivalent)
✅ 2+ years enterprise LLM deployments
(GPT-4o or equivalent, 1B+ tokens/month scale)
✅ Expert Python development
(FastAPI, agent orchestration, vector databases)
✅ Production MLOps experience
(model registry, tracing, monitoring, cost optimization)
✅ Enterprise-scale system design
(high availability, fault tolerance, observability, cost controls)
DOMAIN PREFERRED
- Engineering, consulting, or technology services industry experience
- Multi-modal AI (vision, document understanding, structured data)
- Large-scale data platform integration (lakehouse, real-time analytics)
- Databricks ecosystem or Azure cloud platform experience
Technical Tools & Stack
CORE TECHNOLOGIES:
- Python (3.8+, FastAPI, async frameworks)
- Databricks ML ecosystem (Mosaic AI, MLflow)
- Azure OpenAI or equivalent LLM APIs
- Vector databases (Pinecone, Weaviate, Qdrant, or Databricks Vector Search)
AGENT FRAMEWORKS:
- LangChain / LlamaIndex
- crewAI / AutoGen
- Custom orchestration frameworks
- RAG (Retrieval Augmented Generation) systems
MLOPS STACK:
- MLflow (model registry, experiment tracking)
- Databricks Workflows / Apache Airflow
- Monitoring: Weights & Biases, Prometheus/Grafana
- CI/CD: GitHub Actions, GitLab CI, or Jenkins
CLOUD PLATFORMS:
- Azure (Databricks, Azure OpenAI, Fabric, Entra ID)
- AWS or GCP (equivalent enterprise experience acceptable)
- Containerization: Docker, Kubernetes basics
OPTIONAL BUT VALUABLE:
- Prompt engineering / few-shot learning
- Embeddings and semantic search
- Token optimization techniques
- Cost forecasting and budget management
Business Impact & Success Metrics
Platform Impact
- Revenue Productivity: Multi-crore annual value through automation
- Engineering Efficiency: 20%+ productivity improvement across user base
- Cost Discipline: Enterprise-scale inference cost optimization
- Strategic Advantage: First-mover AI capability in domain