Lead AI Engineer - SRE, LLM Agents, Full-Stack Architecture

ivedha inc.

India

8-10 Years

Save

Posted a month ago
Be among the first 10 applicants

Early Applicant

Job Description

About iVedha:

iVedha Inc. is a global AI-first digital transformation company with over 25 years of excellence. Powered by the iVedha Fabric - our AI-native operating system, we unify cloud, data, AI, security, and people to deliver measurable, resilient outcomes. Our expertise spans Agentic AI, Generative AI, Cloud Engineering, Cybersecurity, Data Modernization, Application Transformation, and Talent Enablement.

Join our team of forward-thinking innovators shaping the future of intelligent enterprises, where automation, observability, and AI-driven quality assurance redefine delivery velocity.

About the Opportunity

A leading financial institution is seeking a highly experienced Lead AI Engineer to join its advanced technology division. This is a high-impact, leadership-track role at the intersection of AI engineering, Site Reliability, and enterprise-grade software architecture. The successful candidate will design, build, and operationalize the next generation of agentic AI systems within a regulated banking environment — driving intelligent automation while maintaining the rigorous security, compliance, and availability standards demanded by the financial services industry.

You will architect multi-agent LLM systems, implement Model Context Protocol (MCP) servers, build production-grade RAG pipelines, and lead AI observability practices using the ELK stack. This role requires deep technical expertise combined with the leadership acumen to mentor engineers and influence cross-functional technical decisions.

Key Responsibilities

Pillar 1 — AI Architecture & Agentic Systems

Design and implement sophisticated LLM-powered agentic workflows and multi-agent architectures capable of autonomous reasoning, planning, and tool execution within secure financial system boundaries.
Architect and deploy scalable Model Context Protocol (MCP) servers to enable standardized, secure, and rich context management between AI models, internal banking APIs, and external data sources.
Develop production-grade Retrieval-Augmented Generation (RAG) and GraphRAG pipelines that ground AI agents in accurate, real-time enterprise financial data with full auditability.
Leverage expertise in Meta AI (Llama ecosystem), Google AI (Gemini, Vertex AI), and Microsoft Copilot to build and integrate cutting-edge AI features while adhering to financial data handling policies.
Implement prompt versioning, model drift detection, and automated evaluation pipelines to maintain AI system quality and regulatory compliance over time.

Pillar 2 — Full-Stack Engineering

Lead end-to-end development of robust, scalable AI applications using Node.js (TypeScript) and Python (FastAPI/Django) — both languages are required.
Champion AI-assisted developer workflows (Vibe Coding) using advanced tools such as Cursor and GitHub Copilot to improve team productivity and code quality.
Design and implement secure, high-performance RESTful and GraphQL APIs to serve LLM inferences and agentic actions to frontend and downstream systems.
Develop and maintain Bash and Python automation scripts for infrastructure management, deployment orchestration, and operational efficiency.
Mentor junior and mid-level engineers in AI-native development practices and modern architectural patterns.

Pillar 3 — Site Reliability Engineering & AI Observability

Implement comprehensive observability stacks using the ELK Stack (Elasticsearch, Logstash, Kibana) specifically tuned for LLM performance metrics: latency, token usage, hallucination rates, and model drift indicators.
Apply SRE best practices to AI workloads — ensuring high availability, fault tolerance, incident response playbooks, and SLO/SLA management for LLM inference services.
Build and maintain CI/CD pipelines tailored for machine learning models, including prompt versioning, model evaluation gates, shadow deployments, and automated rollback.
Design alerting, on-call runbooks, and escalation paths for AI system incidents within a regulated financial services environment.

Required Qualifications:

Programming Languages- Expert-level proficiency in Node.js (TypeScript/JavaScript) and Python. Both are required. Bash scripting for infrastructure automation is mandatory.
AI & Machine Learning - Deep understanding of LLM architectures, prompt engineering, fine-tuning techniques (LoRA/qLoRA), and embedding models. Proven experience building and operating production-grade LLM applications.
Agentic Frameworks - Hands-on experience designing autonomous agents and implementing Model Context Protocol (MCP) servers for standardized tool and context management.
RAG & Vector Databases - Strong experience building RAG and GraphRAG pipelines. Proficiency with vector databases (Pinecone, Milvus, or Weaviate) and embedding model selection strategies.
Observability & SRE - Extensive hands-on experience with the ELK Stack (Elasticsearch, Logstash, Kibana) for distributed system logging, monitoring, and AI-specific metrics tracking.
Cloud & Infrastructure - Proven experience with cloud-native architectures. Azure and AKS (Azure Kubernetes Service) experience strongly preferred for this engagement.
Enterprise AI Tools - Demonstrated expertise with Microsoft Copilot (Copilot Studio extensibility, custom connectors), Meta AI open-source models, and Google AI infrastructure (Gemini/Vertex AI).
Leadership - 8+ years of progressive software engineering experience. Minimum 3 years in a technical leadership or architectural role with a focus on AI/ML systems.

Banking & Compliance Requirements:

Given the regulated nature of this environment, candidates must demonstrate awareness of and experience with the following:

Working knowledge of SOC 2 Type II compliance principles and their impact on AI system design and data handling.
Understanding of financial data classification, PII protection, and audit trail requirements for AI-generated outputs.
Experience implementing secure credential management (e.g., Azure Key Vault, HashiCorp Vault) in production AI systems.
Familiarity with model governance requirements — including explainability, version control, and documentation for AI systems in regulated environments.
Knowledge of zero-trust security principles and least-privilege access patterns for AI agent tool integrations.

Preferred Qualifications:

Experience building or integrating AI observability platforms with OpenTelemetry for unified tracing across AI and infrastructure layers.
Elastic Certified Engineer or Elastic Certified Observability Engineer certification.
Familiarity with Elastic Agent and Fleet management for centralized log collection in enterprise environments.
Prior experience in financial services, banking technology, or fintech with exposure to trading systems, fraud detection, or compliance platforms.
Contributions to open-source AI/ML projects or published research in LLM applications.

Why This Role

This is a rare opportunity to be at the forefront of AI engineering within a major financial institution — building systems that push the boundaries of what autonomous agents can achieve within a complex, regulated enterprise. You will have direct architectural influence over the institution's AI transformation roadmap, work with cutting-edge models and frameworks, and lead a high-caliber engineering team. Your decisions will shape how AI is responsibly deployed in financial services for years to come.

More Info

Job Type:

Industry:

Function:

Employment Type:

About Company

ivedha inc.Job Source: www.linkedin.com

Job ID: 145596459

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Last Updated: 18-05-2026 07:23:16 PM

Homejobs in IndiaLead AI Engineer - SRE, LLM Agents, Full-Stack Architecture

Similar Jobs

Lead Infrastructure Engineer – SRE, DevOps, Cloud, BI, AI Solutions

Jpmorgan & Co

8-10 yrs

Bengaluru, India

Skills:

thoughtspot , Servicenow, Qlik Sense, Sap Businessobjects, Tableau, Grafana, JIRA, Datadog, Terraform, Ibm Cognos, Python, AWS, Cloudformation, PowerShell, Bash, Jenkins, Gcp, Ansible, Dynatrace, Splunk, Azure, Go, LLM Suite, GitLab CI, GitHub Copilot, Spinnaker

Solution Architect Full Stack Mobile Application Architecture -- EXJD5E2BC25

eximius ai

7-9 yrs

Ahmedabad, India

Skills:

.NET, Java, Node.js, Agile Methodology, Angular, React, Docker, Flutter, Safe Agile, Kubernetes, AWS, WaterAgile, event-driven APIs, GraphQL APIs, infrastructure-as-code, Next.js

Do you want to see more relevant and perfect job for you?

Beware of Scammers

We don’t charge any money for job offers

What it feels like to have

48% more interview calls?

To get 5X more recruiter views on your profile

Real-time notifications

Discover new jobs, get recruiter notifications, track applications & more with the foundit App.

Scan to download foundit App