AI Engineer Lead Contractor

eightgen ai services

India

6-8 Years

Save

Posted 5 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

**Company Description**

Eightgen is an AI services company that partners with founders, CIOs, and CXOs to transform ideas into working products. We help startups and enterprises ship AI automation at scale — from intelligent workflows and custom AI agents to enterprise-grade applications.

We are a fully remote team that values outcomes over hours and collaboration over hierarchy. We hire talented people, share context generously, and trust each other to make good decisions.

**Role Description**

We are hiring an AI Engineer Lead (Contract, 3 months initial with strong opportunity to extend for 6+ months) to be a hands-on technical leader for our AI engineering work. You will spend roughly 70% of your time designing, building, and shipping AI systems, and the remaining 30% providing technical direction, reviewing AI/ML work, and mentoring engineers on the team.

You will own the end-to-end design of the LLM-powered features, agents, and data pipelines your team builds — from prompt and retrieval strategy to evaluation, guardrails, and production deployment. This is not a pure research role, not a data-science/notebook role, and not a people-management role: we want a strong software engineer who can take an AI problem from a vague business goal to a reliable, evaluated, production-grade system — owning the services, APIs, and data flow around the model, not just the model — and lead a small team through it.

We are an AI-native engineering team. You will build with LLMs (as the product) and using AI coding assistants (Cursor, Claude Code, GitHub Copilot, or similar) as integral tools in your workflow — and you'll set that standard for the team. Much of our work involves multi-agent systems — orchestrating teams of LLM agents through long-running, human-in-the-loop workflows — so comfort building and reasoning about agentic systems is central to the role.

**Our AI Engineering Philosophy**

We believe the most effective AI engineers are those who:

• Measure before they trust — every agent, RAG pipeline, or fine-tune ships with an evaluation harness, a labeled dataset, and a clear definition of good enough; quality is gated on metrics, not vibes

• Treat AI systems as software — versioned prompts, reproducible pipelines, tests, and observability, not one-off notebook experiments

• Engineer around model limits — design for hallucination, latency, cost, and non-determinism from day one, with retries, fallbacks, and guardrails

• Stay pragmatic about the stack — reach for the simplest thing that works (a good prompt over a fine-tune, retrieval over a bigger model) and only add complexity when the metrics demand it

• Keep humans in control — AI accelerates the work, but quality, safety, and correctness remain the engineer's responsibility

**Key Responsibilities**

• Lead AI delivery end-to-end — own the design and delivery of the LLM features, agents, and pipelines your team is building, define standards within that scope, and ship reliable, maintainable AI systems on time

• Design agentic AI systems — produce technical designs for RAG pipelines, multi-step and multi-agent (lead + sub-agent) systems, tool-use/function-calling flows, and long-running orchestrations with human-in-the-loop gates, with a clear eye on accuracy, latency, cost, and failure modes

• Build evaluation and observability — define metrics, build eval datasets and harnesses, and instrument LLM calls so quality and regressions are visible, not guessed at

• Govern model cost and routing — route work across model tiers, set budget guards, and apply context/token-management strategies so systems stay within cost and latency targets without sacrificing quality

• Stay hands-on — contribute directly across prompt engineering, retrieval, agent orchestration, model integration, the supporting backend services and APIs, and data pipelines — leading by example, not just by review

• Engineer for production — bake in cost controls, rate-limit handling, caching, guardrails, prompt-injection defenses, secure credential handling, and PII/data handling as first-class concerns

• Raise the bar — conduct thorough reviews of prompts, pipelines, and code; provide actionable feedback; and grow the AI engineering capability of those around you

• Make pragmatic trade-off calls — weigh prompt-vs-fine-tune, build-vs-buy, model-vs-cost, and speed-vs-accuracy decisions within your area and clearly articulate the reasoning

• Collaborate cross-functionally — partner with product, design, and business stakeholders to turn ambiguous goals into well-scoped, well-evaluated AI work

**Qualifications**

Required:

• 6+ years of professional software engineering experience overall, including 2+ of those years building production LLM / AI-powered systems (not just prototypes)

• Strong applied LLM experience — production work with the OpenAI, Anthropic, or open-weight model APIs, including prompt engineering, structured output, and function/tool calling

• Multi-agent orchestration experience — building multi-step and multi-agent systems (lead + sub-agent teams, tool-using agents) with agent frameworks (Claude Agent SDK, LangChain, LlamaIndex) or equivalent, or directly against model SDKs, including parsing streamed structured output and managing long-running agent sessions

• Long-running, human-in-the-loop pipeline orchestration — has built stateful, resumable workflows (state machines or equivalent) with approval/milestone gates, recovery, and clear stage hand-offs

• RAG and retrieval expertise — chunking and embedding strategies, vector stores (pgvector, Pinecone, Weaviate, or similar), and retrieval evaluation/tuning

• Evaluation discipline (core to this role) — has built eval datasets and offline/online eval harnesses for non-deterministic systems, defined precision/quality metrics, and used them as a regression gate on prompt and pipeline changes

• Deep Python expertise — production experience with FastAPI (our primary backend framework), async patterns, type hints, Pydantic v2, and modern Python best practices

• Solid backend and data fundamentals — API design, SQL and data modelling (PostgreSQL or similar), and building the services and pipelines that AI features depend on

• Cloud platform experience — production experience on Google Cloud Platform (Cloud Run, Cloud SQL, GCS) or equivalent AWS/Azure services, with a practical grasp of IAM, secrets, and cost trade-offs

• Demonstrated technical leadership — has led engineering work through code/design reviews, operational ownership, or mentoring

• Hands-on experience with AI coding assistants such as Cursor, Claude Code, GitHub Copilot, or similar tools in day-to-day workflows

• Strong review instincts for AI-generated output — able to spot subtle bugs, security issues, or architectural missteps in AI-assisted code, and able to guide teams on using AI tools effectively and critically

Preferred:

• Experience with multi-tier model routing & cost governance — routing work across model tiers per task, enforcing budget limits, and applying context/token-compaction strategies to control cost and latency

• Experience with real-time streaming of LLM output to clients (Server-Sent Events or WebSockets), including replay/late-join handling

• Experience with secure credential handling — encrypting third-party/provider tokens at rest (e.g., Fernet), JWT-based auth, and rate limiting

• Experience with sandboxed / subprocess code execution and Docker / Docker Compose orchestration of ephemeral environments

• Experience with fine-tuning, LoRA/PEFT, or model distillation, and a clear sense of when not to fine-tune

• Familiarity with inference optimization — quantization, batching, streaming, and serving open-weight models (vLLM, Ollama, TGI)

• Experience with prompt-injection / LLM security and safe handling of untrusted input and PII

• Background in data-intensive applications — pipelines, analytics, or enterprise integrations

• Experience with LLM observability/eval tooling (LangSmith, Langfuse, Arize, Ragas, or similar)

• Prior work in early-stage or consulting environments where scope evolves quickly and engineers wear multiple hats

**Technical Environment**

Our primary stack: Python 3.11+ (FastAPI, Pydantic v2, Typer), the modern AI/LLM stack (Anthropic primary, OpenAI, open-weight models; Claude Agent SDK / Claude Code CLI; LangChain, LlamaIndex; function calling and structured stream-JSON output), custom state-machine agent orchestrators with lead + sub-agent teams and human-in-the-loop milestone gates, pgvector (primary), Pinecone, Weaviate, and Qdrant for retrieval, Langfuse/LangSmith/Ragas and custom eval harnesses with OpenTelemetry for LLM tracing, SSE streaming with ring-buffer replay, PostgreSQL (primary) with ClickHouse, BigQuery, Redis, and MongoDB, JWT auth with Fernet credential encryption, Google Cloud Platform (Cloud Run, Cloud SQL, GCS), vLLM/Ollama/TGI for inference, and Docker, GitHub Actions, and Terraform for DevOps. We equally value experience with comparable tools — Temporal/Prefect/Dagster, AWS or Azure, Node.js/TypeScript or Django. The underlying skills transfer.

**Engagement Details**

• Contract Type: Contractor (3 months initial, with strong opportunity to extend for 6+ months)

• Location: Fully Remote

• Start Date: Immediately

**How to Apply**

Apply directly via eightgen.ai/careers. Please include your resume/CV highlighting relevant AI engineering experience, a brief description of an LLM-powered system you designed and shipped to production (the problem, your key design choices around retrieval/agents/evals/guardrails, and what you would do differently today), a description of a multi-agent or long-running orchestrated system you built (how you handled state/recovery, agent hand-offs, and cost control), how you evaluate and monitor the quality of an AI system with a concrete example, a time you reduced false positives or improved the precision of an AI system, examples of how you use AI coding tools in your workflow, and your availability and expected rate.