
Search by job, company or skills

AI Architect – Job Description
Role: AI Architect Agentic AI Systems
Context: Agentic analytics pipeline (root cause analysis, hypothesis testing, question expansion, report generation)
Stack: Python, LangGraph, LangChain, FastAPI, Celery, Redis, PostgreSQL, Databricks, Docker, Kubernetes, Azure, Context Engineering
About the Role
We are looking for an AI Architect who can own the end-to-end design and evolution of our agentic AI pipeline. The candidate should be well versed with event-driven architecture along with LangGraph to orchestrate multi-step workflows. You will define graph topologies, state schemas, routing and parallelization, and production patterns for observability, checkpointing, and scale.
What You'll Do
·Design and evolve LangGraph pipelines: Define and implement state graphs, node contracts, conditional routing, and parallel execution (e.g. SendAPI) for question expansion, data sufficiency, and hypothesis-testing flows. Ensure clear state boundaries and reusable subgraphs.
·Own the agent framework and conventions: Maintain and extend our modular LangGraph framework: base node abstractions, lifecycle decorators, automatic registration, and fluent graph builders. Keep patterns consistent and reduce boilerplate across nodes.
·Orchestrate multi-LLM and tool usage: Integrate and tune multiple LLM providers (OpenAI, Google Vertex, Anthropic, Mistral) and tool chains. Design tool contracts, error handling, and human-in-the-loop or clarification flows where needed.
·Productionize agent pipelines — Integrate with FastAPI, Celery, and Redis for async execution, streaming (e.g. SSE/Redis Streams), and event publishing. Ensure observability (e.g. Langfuse or similar), logging, and tracing for debugging and SLA monitoring.
·Scale execution and data — Evolve execution from in-process to distributed where needed (e.g. Celery workers, Databricks Serverless for heavy or sensitive code execution). Design for security, isolation, and cost.
·Collaborate with product and data — Turn product requirements into graph design (nodes, edges, routing). Work with data/analytics on schema, SQL validation, and statistical testing integration, so agent outputs are reliable and interpretable.
·Design memory architectures - long-term (cross-session knowledge, vector-backed retrieval), short-term (working memory within agent runs), and episodic (learning from past analyses)
·Context window management - token budgeting across multi-step pipelines, summarization strategies, selective context injection, and graceful degradation when context limits are hit
What We're Looking ForMust-have
·Strong Python software engineering and experience with async (asyncio) and production APIs (e.g. FastAPI).
·Hands-on LangGraph and LangChain (or equivalent agent/graph frameworks): building state graphs, conditional edges, subgraphs, and checkpointing. Understanding of state management and reducer patterns (e.g. add for lists).
·LLM integration experience: multiple providers, prompt design, tool/function calling, and basic cost/latency tradeoffs.
·Systems and production mindset: APIs, task queues (e.g. Celery), Redis, RabbitMQ, PostgreSQL. Comfort with Docker and basic DevOps (logging, health checks, env-based config).
·Docker and Kubernetes deployment with microservice architecture
·Ability to design for clarity and maintainability: modular graphs, clear node boundaries, and documentation of flows and state.
Nice-to-have
·Experience with MCP (Model Context Protocol) or similar agent tool protocols.
·Observability for LLM/agent systems (e.g. Langfuse, OpenTelemetry, or custom tracing).
·Databricks (or Spark): serverless jobs, notebooks, or code execution for large or sensitive data.
·Statistics/ML: hypothesis testing, EDA, or working with data scientists on automated analysis pipelines.
·Azure (or other cloud): Blob/Storage, app hosting, and security (e.g. OAuth, API keys).
·Experience with knowledge graphs
How to Apply
Share a resume and a short note on:
1.A system you designed or significantly changed that involved agents, workflows, or multi-step LLM pipelines (what you built, tradeoffs, and what you'd do differently).
2.Your experience with LangGraph/LangChain (or similar) and production deployment of agent systems.
3.Your github repository or OSS contributions
Job ID: 144720329
We don’t charge any money for job offers