Search by job, company or skills

eightgen ai services

AI Engineer Lead Contractor

6-8 Years
Save
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

**Company Description**

Eightgen is an AI services company that partners with founders, CIOs, and CXOs to transform ideas into working products. We help startups and enterprises ship AI automation at scale — from intelligent workflows and custom AI agents to enterprise-grade applications.

We are a fully remote team that values outcomes over hours and collaboration over hierarchy. We hire talented people, share context generously, and trust each other to make good decisions.

**Role Description**

We are hiring an AI Engineer Lead (Contract, 3 months initial with strong opportunity to extend for 6+ months) to be a hands-on technical leader for our AI engineering work. You will spend roughly 70% of your time designing, building, and shipping AI systems, and the remaining 30% providing technical direction, reviewing AI/ML work, and mentoring engineers on the team.

You will own the end-to-end design of the LLM-powered features, agents, and data pipelines your team builds — from prompt and retrieval strategy to evaluation, guardrails, and production deployment. This is not a pure research role, not a data-science/notebook role, and not a people-management role: we want a strong software engineer who can take an AI problem from a vague business goal to a reliable, evaluated, production-grade system — owning the services, APIs, and data flow around the model, not just the model — and lead a small team through it.

We are an AI-native engineering team. You will build with LLMs (as the product) and using AI coding assistants (Cursor, Claude Code, GitHub Copilot, or similar) as integral tools in your workflow — and you'll set that standard for the team. Much of our work involves multi-agent systems — orchestrating teams of LLM agents through long-running, human-in-the-loop workflows — so comfort building and reasoning about agentic systems is central to the role.

**Our AI Engineering Philosophy**

We believe the most effective AI engineers are those who:

• Measure before they trust — every agent, RAG pipeline, or fine-tune ships with an evaluation harness, a labeled dataset, and a clear definition of good enough; quality is gated on metrics, not vibes

• Treat AI systems as software — versioned prompts, reproducible pipelines, tests, and observability, not one-off notebook experiments

• Engineer around model limits — design for hallucination, latency, cost, and non-determinism from day one, with retries, fallbacks, and guardrails

• Stay pragmatic about the stack — reach for the simplest thing that works (a good prompt over a fine-tune, retrieval over a bigger model) and only add complexity when the metrics demand it

• Keep humans in control — AI accelerates the work, but quality, safety, and correctness remain the engineer's responsibility

**Key Responsibilities**

• Lead AI delivery end-to-end — own the design and delivery of the LLM features, agents, and pipelines your team is building, define standards within that scope, and ship reliable, maintainable AI systems on time

• Design agentic AI systems — produce technical designs for RAG pipelines, multi-step and multi-agent (lead + sub-agent) systems, tool-use/function-calling flows, and long-running orchestrations with human-in-the-loop gates, with a clear eye on accuracy, latency, cost, and failure modes

• Build evaluation and observability — define metrics, build eval datasets and harnesses, and instrument LLM calls so quality and regressions are visible, not guessed at

• Govern model cost and routing — route work across model tiers, set budget guards, and apply context/token-management strategies so systems stay within cost and latency targets without sacrificing quality

• Stay hands-on — contribute directly across prompt engineering, retrieval, agent orchestration, model integration, the supporting backend services and APIs, and data pipelines — leading by example, not just by review

• Engineer for production — bake in cost controls, rate-limit handling, caching, guardrails, prompt-injection defenses, secure credential handling, and PII/data handling as first-class concerns

• Raise the bar — conduct thorough reviews of prompts, pipelines, and code; provide actionable feedback; and grow the AI engineering capability of those around you

• Make pragmatic trade-off calls — weigh prompt-vs-fine-tune, build-vs-buy, model-vs-cost, and speed-vs-accuracy decisions within your area and clearly articulate the reasoning

• Collaborate cross-functionally — partner with product, design, and business stakeholders to turn ambiguous goals into well-scoped, well-evaluated AI work

**Qualifications**

Required:

• 6+ years of professional software engineering experience overall, including 2+ of those years building production LLM / AI-powered systems (not just prototypes)

• Strong applied LLM experience — production work with the OpenAI, Anthropic, or open-weight model APIs, including prompt engineering, structured output, and function/tool calling

• Multi-agent orchestration experience — building multi-step and multi-agent systems (lead + sub-agent teams, tool-using agents) with agent frameworks (Claude Agent SDK, LangChain, LlamaIndex) or equivalent, or directly against model SDKs, including parsing streamed structured output and managing long-running agent sessions

• Long-running, human-in-the-loop pipeline orchestration — has built stateful, resumable workflows (state machines or equivalent) with approval/milestone gates, recovery, and clear stage hand-offs

• RAG and retrieval expertise — chunking and embedding strategies, vector stores (pgvector, Pinecone, Weaviate, or similar), and retrieval evaluation/tuning

• Evaluation discipline (core to this role) — has built eval datasets and offline/online eval harnesses for non-deterministic systems, defined precision/quality metrics, and used them as a regression gate on prompt and pipeline changes

• Deep Python expertise — production experience with FastAPI (our primary backend framework), async patterns, type hints, Pydantic v2, and modern Python best practices

• Solid backend and data fundamentals — API design, SQL and data modelling (PostgreSQL or similar), and building the services and pipelines that AI features depend on

• Cloud platform experience — production experience on Google Cloud Platform (Cloud Run, Cloud SQL, GCS) or equivalent AWS/Azure services, with a practical grasp of IAM, secrets, and cost trade-offs

• Demonstrated technical leadership — has led engineering work through code/design reviews, operational ownership, or mentoring

• Hands-on experience with AI coding assistants such as Cursor, Claude Code, GitHub Copilot, or similar tools in day-to-day workflows

• Strong review instincts for AI-generated output — able to spot subtle bugs, security issues, or architectural missteps in AI-assisted code, and able to guide teams on using AI tools effectively and critically

Preferred:

• Experience with multi-tier model routing & cost governance — routing work across model tiers per task, enforcing budget limits, and applying context/token-compaction strategies to control cost and latency

• Experience with real-time streaming of LLM output to clients (Server-Sent Events or WebSockets), including replay/late-join handling

• Experience with secure credential handling — encrypting third-party/provider tokens at rest (e.g., Fernet), JWT-based auth, and rate limiting

• Experience with sandboxed / subprocess code execution and Docker / Docker Compose orchestration of ephemeral environments

• Experience with fine-tuning, LoRA/PEFT, or model distillation, and a clear sense of when not to fine-tune

• Familiarity with inference optimization — quantization, batching, streaming, and serving open-weight models (vLLM, Ollama, TGI)

• Experience with prompt-injection / LLM security and safe handling of untrusted input and PII

• Background in data-intensive applications — pipelines, analytics, or enterprise integrations

• Experience with LLM observability/eval tooling (LangSmith, Langfuse, Arize, Ragas, or similar)

• Prior work in early-stage or consulting environments where scope evolves quickly and engineers wear multiple hats

**Technical Environment**

Our primary stack: Python 3.11+ (FastAPI, Pydantic v2, Typer), the modern AI/LLM stack (Anthropic primary, OpenAI, open-weight models; Claude Agent SDK / Claude Code CLI; LangChain, LlamaIndex; function calling and structured stream-JSON output), custom state-machine agent orchestrators with lead + sub-agent teams and human-in-the-loop milestone gates, pgvector (primary), Pinecone, Weaviate, and Qdrant for retrieval, Langfuse/LangSmith/Ragas and custom eval harnesses with OpenTelemetry for LLM tracing, SSE streaming with ring-buffer replay, PostgreSQL (primary) with ClickHouse, BigQuery, Redis, and MongoDB, JWT auth with Fernet credential encryption, Google Cloud Platform (Cloud Run, Cloud SQL, GCS), vLLM/Ollama/TGI for inference, and Docker, GitHub Actions, and Terraform for DevOps. We equally value experience with comparable tools — Temporal/Prefect/Dagster, AWS or Azure, Node.js/TypeScript or Django. The underlying skills transfer.

**Engagement Details**

• Contract Type: Contractor (3 months initial, with strong opportunity to extend for 6+ months)

• Location: Fully Remote

• Start Date: Immediately

**How to Apply**

Apply directly via eightgen.ai/careers. Please include your resume/CV highlighting relevant AI engineering experience, a brief description of an LLM-powered system you designed and shipped to production (the problem, your key design choices around retrieval/agents/evals/guardrails, and what you would do differently today), a description of a multi-agent or long-running orchestrated system you built (how you handled state/recovery, agent hand-offs, and cost control), how you evaluate and monitor the quality of an AI system with a concrete example, a time you reduced false positives or improved the precision of an AI system, examples of how you use AI coding tools in your workflow, and your availability and expected rate.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 149085633