
Search by job, company or skills
TeraOps is building the next generation of cloud cost and margin intelligence for modern, consumption-based platforms. As cloud-native architectures evolve to support AI-heavy workloads, traditional cost-optimization approaches break down. GPU runtimes, bursty inference patterns, and autonomous systems introduce a new class of cost risk that legacy tools cannot handle.
TeraOps addresses these challenges by combining deep cloud infrastructure intelligence with agentic AI systems that can analyze, reason about, and act on complex AWS environments. Our goal is not to build another reporting dashboard, but to create a system that understands cloud behavior and enforces economic discipline at scale.
The Role
We are hiring an AI & Agentic Systems Engineer to join TeraOps on the engineering team. This role is designed for a hands-on AI expert who moves beyond simple prompt engineering to build sophisticated, multi-step agentic architectures that interact with real-world infrastructure.
You will serve as:
● The Agentic AI Subject Matter Expert (SME) for the TeraOps platform.
● An expert engineer is building the reasoning engine that translates cloud metadata into autonomous actions.
● A technical leader defining how our agents adapt to new tools, manage memory, and optimize their own inference costs.
This is not a research role; it is a systems engineering role. You will guide how our AI observes, plans, and executes within AWS, ensuring our Expert Agents are as efficient as they are intelligent.
What You Will Do
● Design and implement multi-agent workflows using frameworks like MCP, LangGraph, or Bedrock Agents to solve complex cloud optimization problems.
● Develop the Tools Adaptation layer to enable agents to surgically interact with AWS APIs (S3, EC2, RDS) for last-mile remediation.
● Define the safeguards and constraints that govern how AI systems act on customer environments to ensure risk-free execution.
2.Context Intelligence & Memory Management
● Build the Context Intelligence engine that ingests multi-dimensional data (CUR, CloudWatch, App Telemetry) to provide agents with a full-picture view of infrastructure.
● Implement advanced Memory Management strategies, including RAG (Retrieval-Augmented Generation) and MCP (Model Context Protocol) to ensure agents have the right data at the right time.
● Optimize retrieval patterns to reduce token noise, directly lowering the AI unit cost of the platform.
3.Inference Optimization & Model Routing
● Implement Model Routing logic to balance performance, latency, and cost—automatically choosing the right model (e.g., Claude 3.5 Sonnet vs. Haiku) based on task complexity.
● Track and optimize AI Unit Economics to ensure the platform remains profitable even during 10,000x usage spikes from power users.
● Design for Agentic Efficiency—reducing the number of reasoning loops required to reach a confident execution plan.
4.Leadership
● Collaborate closely with AWS Architects to align AI reasoning with AWS Well-Architected principles.
● Own technical outcomes, not just code—success is measured by the realized savings and ROI our agents deliver to customers.
● Set the engineering standards for building, testing, and deploying agentic systems at scale.
Required Qualifications
● Deep LLM & Agentic Expertise: Proven experience building production-grade AI systems that go beyond simple chat—specifically involving agentic workflows, tool-use, and multi-step reasoning.
● Advanced Python Engineering: Mastery of Python for systems integration, with experience using AI SDKs (LangChain, Boto3, OpenAI/Anthropic).
● Memory & RAG Mastery: Strong understanding of vector databases, context window management, and semantic retrieval strategies.
● Cloud-Native Mindset: Familiarity with AWS services and how to programmatically observe and control them.
● Pragmatic AI Specialist: Ability to reason about the trade-offs between model accuracy, inference speed, and cost.
● Systems Thinker: Ability to design AI that can connect the dots between disparate data sources like billing logs and runtime telemetry.
● High Ownership Mentality: Self-directed and comfortable operating with the ambiguity of a founding-stage startup.
Why This Role Is Different
Job ID: 147499361
We don’t charge any money for job offers