Agentic AI Infrastructure & Orchestration Engineer

inoptra digital

Bengaluru, India

5-7 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

Looking for USA local candidates.

Location- Remote Work

Key Responsibilities

Architect and deploy agentic multi-agent AI frameworks.
Develop scalable pipelines integrating LLM → RAG → VectorDB → Agents
Build and deploy MCP server for Agentic AI Agents and Integration
Build observability, latency optimization, and performance monitoring systems.
Implement self-refine / feedback loop learning architectures.
Multi-Agent System Architecture & Deployment
Architect, design, and deploy agentic multi-agent frameworks where multiple AI agents collaborate autonomously.
Design and implement inter-agent communication protocols, coordination strategies, and workflow orchestration layers.
Integrate with frameworks such as LangGraph, CrewAI, AutoGen, or Swarm to develop distributed, event-driven agentic ecosystems.
Develop containerized deployments (Docker / Kubernetes) for multi-agent clusters running in hybrid or multi-cloud environments.
Intelligent Pipeline Development
Build end-to-end scalable pipelines integrating LLMs → RAG → VectorDB → Agents, ensuring optimal latency and retrieval quality.
Implement retrieval-augmented generation (RAG) architectures using FAISS, Chroma, Weaviate, Milvus, or Pinecone.
Develop embedding generation, storage, and query pipelines using OpenAI, Hugging Face, or local LLMs.
Orchestrate data movement, context caching, and memory persistence for agentic reasoning loops.
Agentic Infrastructure & Orchestration
Build and maintain MCP (Model Context Protocol) servers for Agentic AI agents and integrations.
Develop APIs, microservices, and serverless components for flexible integration with third-party systems.
Implement distributed task scheduling and event orchestration using Celery, Airflow, Temporal, or Prefect.
Observability, Performance, and Optimization
Build observability stacks for multi-agent systems with centralized logging, distributed tracing, and metrics visualization.
Optimize latency, throughput, and inference cost across LLM and RAG layers.
Implement performance benchmarking and automated regression testing for large-scale agent orchestration.
Monitor LLM response quality, drift, and fine-tuning performance through continuous feedback loops.
Self-Refining & Feedback Loop Architectures
Implement self-refining / reinforcement learning feedback mechanisms for agents to iteratively improve their performance.
Integrate auto-evaluation agents to assess output correctness and reduce hallucination.
Design memory systems (episodic, semantic, long-term) for adaptive agent learning and contextual persistence.
Experiment with tool-use capabilities, chaining, and adaptive reasoning strategies to enhance autonomous capabilities.

Technical Skills Required

Programming: Expert-level Python (async, multiprocessing, API design, performance tuning).
LLM Ecosystem: Familiarity with OpenAI, Anthropic, Hugging Face, Ollama, LangChain, LangGraph, CrewAI, or AutoGen.
Databases: VectorDBs (FAISS, Weaviate, Milvus, Pinecone), NoSQL (MongoDB, Redis), SQL (PostgreSQL, MySQL).
Cloud Platforms: AWS / Azure / GCP; experience with Kubernetes, Docker, Terraform, and serverless architecture.
Observability: Prometheus, Grafana, OpenTelemetry, ELK Stack, Datadog, or New Relic.
CI/CD & DevOps: GitHub Actions, Jenkins, ArgoCD, Cloud Build, and testing frameworks (PyTest, Locust, etc.).
Other Tools: FastAPI, gRPC, REST, Kafka, Redis Streams, or event-driven frameworks.

Preferred Experience

Experience designing agentic workflows or AI orchestration systems in production environments.
Background in applied AI infrastructure, ML Ops, or distributed system design.
Exposure to RAG-based conversational AI or autonomous task delegation frameworks.
Strong understanding of context management, caching, and inference optimization for large models.
Experience with multi-agent benchmarking or simulation environments.

Soft Skills

Ability to translate conceptual AI architectures into production-grade systems.
Strong problem-solving and debugging capabilities in distributed environments.
Collaboration mindset – working closely with AI researchers, data scientists, and backend teams.
Passion for innovation in agentic intelligence, orchestration systems, and AI autonomy.

Education & Experience

Bachelor's or Master's in Computer Science, AI/ML, or related technical field.
5+ years of experience in backend, cloud, or AI infrastructure engineering.
2+ years in applied AI or LLM-based system development preferred.

Optional Nice-to-Haves

Knowledge of Reinforcement Learning from Human Feedback (RLHF) or self-improving AI systems.
Experience deploying on-premise or private LLMs or integrating custom fine-tuned models.
Familiarity with graph-based reasoning or knowledge representation systems.
Understanding of AI safety, alignment, and autonomous agent governance.