Search by job, company or skills

K

AI/ML Engineer – Agentic AI

new job description bg glownew job description bg glownew job description bg svg
  • Posted 12 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Company Overview

KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and we invest 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world's leading technology providers to accelerate the delivery of tomorrow's electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us.

Group/Division

The Information Technology (IT) group at KLA is involved in every aspect of the global business. IT's mission is to enable business growth and productivity by connecting people, process, and technology. It focuses not only on enhancing the technology that enables our business to thrive but also on how employees use and are empowered by technology. This integrated approach to customer service, creativity and technological excellence enables employee productivity, business analytics, and process excellence.

Job Description/Preferred Qualifications

Position Overview

We are seeking a hands-on AI/ML Engineer specializing in AI agent systems to design and implement agentic workflows that can plan, reason, use tools, and safely execute tasks across complex enterprise environments. You will build orchestration layers, tool interfaces, memory and state management, and evaluation/guardrails to ensure agents are reliable, secure, and production-ready.

Key Responsibilities

  • Agent Architecture & Orchestration

  • Design and implement agent architectures (single-agent and multi-agent) with robust planning, tool use, and state management.

  • Build orchestration patterns such as supervisor/worker, router-based specialization, and iterative refinement loops.

  • Develop reusable agent frameworks including prompt templates, tool schemas, and policy-based controls.

  • Tooling, Integrations & Automation

  • Create tool interfaces for internal services and data sources (APIs, databases, ticketing, knowledge bases) with strong typing and validation.

  • Implement safe execution patterns (sandboxing where appropriate, permission gating, step limits, and deterministic fallbacks).

  • Integrate agents into user-facing and backend workflows (chat, copilots, automation pipelines).

  • Reliability, Safety & Guardrails

  • Implement guardrails for tool use, data access, and response policies (PII handling, prompt-injection resistance, output constraints).

  • Build monitoring for agent behavior: tool-call success rates, failure modes, loops, latency, cost, and user satisfaction signals.

  • Run incident response and post-mortems for agent failures improve robustness via systemic fixes and runbooks.

  • Evaluation & Continuous Improvement

  • Design evaluation suites for agent behavior (task success rate, tool correctness, factuality/grounding when retrieval is used).

  • Build regression testing and canary releases to safely ship updates to prompts, tools, and models.

  • Develop feedback loops using user signals, targeted labeling, and automated test generation for recurring failure patterns.

  • Performance & Cost Optimization

  • Optimize agent latency and cost using caching, memoization, selective tool calling, context management, and lightweight models where appropriate.

  • Implement rate limiting, retries, circuit breakers, and queueing strategies to protect downstream dependencies.

  • Collaboration & Documentation

  • Partner with product and engineering teams to translate business workflows into agent designs and measurable success criteria.

  • Document patterns, best practices, and reference implementations for teams adopting agentic systems.

Required Qualifications

  • Bachelor's degree in Computer Science, Engineering, Data Science, Human-Computer Interaction, or a related field with 5+ years of relevant experience OR a Master's/PhD with 3+ years of relevant experience.

  • Strong programming skills in Python and experience building LLM-powered applications with tool/function calling.

  • Experience designing APIs/integrations and building secure, maintainable services.

  • Understanding of reliability engineering concepts (observability, incident response, safe rollouts).

  • Experience implementing structured outputs (schemas), validation, and error-handling for production systems.

  • Strong communication and ability to work effectively in cross-functional teams.

Preferred Qualifications

  • Experience with multi-agent orchestration patterns (supervisor/worker, planner/executor) and stateful workflows.

  • Experience with prompt injection defenses, safety policies, and data governance for enterprise AI.

  • Experience with evaluation frameworks for agentic systems (task benchmarks, simulation, golden tasks, human-in-the-loop review).

  • Experience integrating retrieval (RAG) into agents for grounded reasoning and citations.

  • Experience with workflow engines/queues (e.g., Airflow, Temporal, Celery) and distributed systems patterns.

What Success Looks Like (First 6-12 Months)

  • Agent workflows that reliably complete targeted tasks with measurable success metrics and low operational burden.

  • Safe tool-use and permissioning that prevents unintended actions and protects sensitive systems.

  • A strong eval + regression pipeline enabling fast iteration without quality regressions.

  • Improved latency/cost through optimized tool calling, context control, and operational safeguards.

Minimum Qualifications

Doctorate (Academic) or work experience of 0 years , Master's Level Degree or work experience of 2 years , Bachelor's Level Degree or work experience of 3 years

Be aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees. KLA never asks for any financial compensation to be considered for an interview, to become an employee, or for equipment. Further, KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA. Please ensure that you have searched for legitimate job postings. KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers. If you are concerned that a communication, an interview, an offer of employment, or that an employee is not legitimate, please send an email to to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.

More Info

About Company

KLA Corporation is a capital equipment company based in Milpitas, California. It supplies process control and yield management systems for the semiconductor industry and other related nanoelectronics industries. The company's products and services are intended for all phases of wafer, reticle, integrated circuit (IC) and packaging production, from research and development to final volume manufacturing.

Job ID: 143746095