Search by job, company or skills

inoptra digital

Agentic AI Infrastructure & Orchestration Engineer

Save
new job description bg glownew job description bg glow
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Looking for USA local candidates.

Location- Remote Work

Key Responsibilities

  • Architect and deploy agentic multi-agent AI frameworks.
  • Develop scalable pipelines integrating LLM → RAG → VectorDB → Agents
  • Build and deploy MCP server for Agentic AI Agents and Integration
  • Build observability, latency optimization, and performance monitoring systems.
  • Implement self-refine / feedback loop learning architectures.
  • Multi-Agent System Architecture & Deployment
  • Architect, design, and deploy agentic multi-agent frameworks where multiple AI agents collaborate autonomously.
  • Design and implement inter-agent communication protocols, coordination strategies, and workflow orchestration layers.
  • Integrate with frameworks such as LangGraph, CrewAI, AutoGen, or Swarm to develop distributed, event-driven agentic ecosystems.
  • Develop containerized deployments (Docker / Kubernetes) for multi-agent clusters running in hybrid or multi-cloud environments.
  • Intelligent Pipeline Development
  • Build end-to-end scalable pipelines integrating LLMs → RAG → VectorDB → Agents, ensuring optimal latency and retrieval quality.
  • Implement retrieval-augmented generation (RAG) architectures using FAISS, Chroma, Weaviate, Milvus, or Pinecone.
  • Develop embedding generation, storage, and query pipelines using OpenAI, Hugging Face, or local LLMs.
  • Orchestrate data movement, context caching, and memory persistence for agentic reasoning loops.
  • Agentic Infrastructure & Orchestration
  • Build and maintain MCP (Model Context Protocol) servers for Agentic AI agents and integrations.
  • Develop APIs, microservices, and serverless components for flexible integration with third-party systems.
  • Implement distributed task scheduling and event orchestration using Celery, Airflow, Temporal, or Prefect.
  • Observability, Performance, and Optimization
  • Build observability stacks for multi-agent systems with centralized logging, distributed tracing, and metrics visualization.
  • Optimize latency, throughput, and inference cost across LLM and RAG layers.
  • Implement performance benchmarking and automated regression testing for large-scale agent orchestration.
  • Monitor LLM response quality, drift, and fine-tuning performance through continuous feedback loops.
  • Self-Refining & Feedback Loop Architectures
  • Implement self-refining / reinforcement learning feedback mechanisms for agents to iteratively improve their performance.
  • Integrate auto-evaluation agents to assess output correctness and reduce hallucination.
  • Design memory systems (episodic, semantic, long-term) for adaptive agent learning and contextual persistence.
  • Experiment with tool-use capabilities, chaining, and adaptive reasoning strategies to enhance autonomous capabilities.

Technical Skills Required

  • Programming: Expert-level Python (async, multiprocessing, API design, performance tuning).
  • LLM Ecosystem: Familiarity with OpenAI, Anthropic, Hugging Face, Ollama, LangChain, LangGraph, CrewAI, or AutoGen.
  • Databases: VectorDBs (FAISS, Weaviate, Milvus, Pinecone), NoSQL (MongoDB, Redis), SQL (PostgreSQL, MySQL).
  • Cloud Platforms: AWS / Azure / GCP; experience with Kubernetes, Docker, Terraform, and serverless architecture.
  • Observability: Prometheus, Grafana, OpenTelemetry, ELK Stack, Datadog, or New Relic.
  • CI/CD & DevOps: GitHub Actions, Jenkins, ArgoCD, Cloud Build, and testing frameworks (PyTest, Locust, etc.).
  • Other Tools: FastAPI, gRPC, REST, Kafka, Redis Streams, or event-driven frameworks.

Preferred Experience

  • Experience designing agentic workflows or AI orchestration systems in production environments.
  • Background in applied AI infrastructure, ML Ops, or distributed system design.
  • Exposure to RAG-based conversational AI or autonomous task delegation frameworks.
  • Strong understanding of context management, caching, and inference optimization for large models.
  • Experience with multi-agent benchmarking or simulation environments.

Soft Skills

  • Ability to translate conceptual AI architectures into production-grade systems.
  • Strong problem-solving and debugging capabilities in distributed environments.
  • Collaboration mindset – working closely with AI researchers, data scientists, and backend teams.
  • Passion for innovation in agentic intelligence, orchestration systems, and AI autonomy.

Education & Experience

  • Bachelor's or Master's in Computer Science, AI/ML, or related technical field.
  • 5+ years of experience in backend, cloud, or AI infrastructure engineering.
  • 2+ years in applied AI or LLM-based system development preferred.

Optional Nice-to-Haves

  • Knowledge of Reinforcement Learning from Human Feedback (RLHF) or self-improving AI systems.
  • Experience deploying on-premise or private LLMs or integrating custom fine-tuned models.
  • Familiarity with graph-based reasoning or knowledge representation systems.
  • Understanding of AI safety, alignment, and autonomous agent governance.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 148227133