Search by job, company or skills

Recro

Generative AI Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 23 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the job

Role Overview

As the AI Systems Architect, you'll own the end-to-end design and delivery of production-grade agentic and Generative AI systems. This is a highly hands-on role requiring deep architectural insight, coding proficiency, and an obsession with performance, scalability, and reliability. You'll architect secure, cost-efficient AI platforms on AWS, guide developers through complex debugging and optimization, and ensure all systems are observable, governed, and production-ready.

Key Responsibilities

  • Architect Production AI Systems: Design robust architectures for agentic systems (planning, reasoning, tool-calling), GenAI/RAG pipelines, and evaluation workflows. Create detailed design documents, including flow/UML/sequence diagrams and AWS deployment topologies.
  • Optimize for Cost & Performance: Model throughput, latency, concurrency, autoscaling, CPU/GPU sizing, and vector index performance to ensure scalable, efficient deployments.
  • Lead Debugging & Stability Efforts: Conduct deep-dive debugging, fix critical defects, and resolve production incidents; pair-program with developers to improve code quality and performance.
  • Standardize Agentic Frameworks: Build reference implementations using Semantic Kernel (preferred), LangGraph, AutoGen, or CrewAI with strong schema validation, grounding, and memory management.
  • Engineer Retrieval & Search Systems: Architect hybrid retrieval solutions including ingestion, chunking, embeddings, ranking, caching, and freshness management while minimizing hallucination risk.
  • Productionize on AWS: Deploy and manage systems using Amazon EKS, Bedrock, S3, SQS/SNS, RDS, and ElastiCache. Integrate IAM/Okta, Secrets Manager, and Datadog for observability, enforcing SLIs/SLOs and error budgets.
  • Implement Observability & Monitoring: Set up distributed tracing, metrics, and logging via OpenTelemetry and Datadog. Standardize dashboards, alerts, and incident response workflows.
  • Govern Evaluation & Rollouts: Build test and evaluation frameworksgolden sets, A/B experiments, regression suites, and controlled rolloutsto ensure consistent quality across releases.
  • Embed Security & Safety: Enforce least privilege, PII protection, and policy compliance through threat modeling, sandboxed execution, and prompt-injection defense.
  • Establish Engineering Standards: Create reusable SDKs, connectors, CI/CD templates, and architecture review checklists to promote consistency across teams.
  • Cross-Functional Leadership: Collaborate with product, data, and SRE teams for capacity planning, DR strategies, and post-incident RCA reviews. Mentor engineers to strengthen design and reliability practices.

Must-Have Qualifications

  • 710 years in software/AI engineering, including 4+ years in GenAI application development and 2+ years architecting agentic AI systems.
  • Expert in Python 3.11+ (asyncio, typing, packaging, profiling, pytest).
  • Hands-on experience with Semantic Kernel, LangGraph, AutoGen, or CrewAI.
  • Proven delivery of GenAI/RAG systems on AWS Bedrock or equivalent vector-based platforms (OpenSearch Serverless, Pinecone, Redis).
  • Deep understanding of AWS ecosystem: EKS, Bedrock, S3, SQS/SNS, RDS, ElastiCache, Secrets Manager, IAM/Okta, Kong API Gateway, Datadog.
  • Expertise in observability and incident management using OpenTelemetry and Datadog.
  • Strong focus on cost, performance, and security engineeringFinOps mindset, autoscaling, caching, and policy enforcement.
  • Exceptional communicationclear diagrams, ADRs, and peer review practices.

Nice-to-Have Skills

  • Multi-agent orchestration (task decomposition, coordinator-worker, graph-based planning).
  • Expertise with vector databases (OpenSearch, Pinecone, pgvector, Redis).
  • Experience with AI evaluation, guardrails, and rollout gating.
  • Familiarity with frontend agent interfaces, secure APIs, and AuthN/Z best practices.
  • Exposure to policy-as-code, multi-tenant architectures, and feature management (Kong, LaunchDarkly, Flipt).
  • Experience with CI/CD via GitHub Actions and IaC (Terraform/AWS CloudFormation).

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 131843243