Search by job, company or skills

Centific

Principal Agentic Architect

new job description bg glownew job description bg glownew job description bg svg
  • Posted 5 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Location

Hybrid / Remote (global, aligned to target customer time zones)

Role Type

Full-time | Principal Level

Role Overview

Centific's DAC (Digital Architecture & Cognitive) Command is expanding its global architecture unit to build and operationalize agentic, AI-driven business automation at production scale.

In this role, you will act as the end-to-end design authority for agentic inference solutionsowning outcomes from blueprint to live operations. You will architect multi-agent systems, runtime orchestration, and operational guardrails that meet demanding non-functional requirements (latency, reliability, cost, and security).

This is a hands-on role. You will prototype reference implementations, tune runtime behavior, and partner with engineering, platform, security, and product stakeholders to deliver production-first agentic systems.

Key Responsibilities

1. Agentic System Architecture & Orchestration

  • Design multi-agent architectures (plannerexecutor, supervisor loops, routing/dispatch, delegation, reflection/verification patterns) aligned to business workflows.
  • Define orchestration mechanisms for state/session handling, memory (short/long-term), tool invocation, retrieval/RAG, and structured I/O.
  • Establish standards for prompt/agent templates, tool/skill contracts, agent-to-agent messaging, and deterministic fallbacks.
  • Create reference implementations that teams can extend safely (agent frameworks, orchestration services, reusable libraries).

2. NFR-Driven Design for Production Inference

  • Own non-functional design (latency, throughput, scalability, reliability, availability, cost) as first-class requirements.
  • Design for performance and cost: token budgeting, caching strategies, batching, streaming responses, concurrency controls, and adaptive routing.
  • Define resilience patterns: timeouts, retries, circuit breakers, idempotency, queue back-pressure, graceful degradation, and safe-mode behavior.
  • Drive architecture decisions that balance quality vs. cost vs. speeddocumenting trade-offs and expected SLOs/SLAs.

3. Solution Blueprint Ownership & End-to-End Delivery

  • Own the end-to-end solution blueprint from concept through production rollout (architecture, integration, testing, operations).
  • Translate business intent into system decomposition (services, agents, tools, data flows) with clear ownership boundaries and contracts.
  • Collaborate with Solution Blueprint Architects, Platform Architects, Data/Governance, and Security/Compliance to align constraints early.
  • Deliver architecture artifacts: sequence diagrams, decision records (ADRs), integration specs, runbooks, acceptance criteria, and launch checklists.

4. Integration Governance & Platform Compatibility

  • Set integration standards for APIs/events (versioning, compatibility contracts, error semantics, schema governance).
  • Define interfaces for tool invocation (capabilities registry, permissions, rate limits, safe parameterization).
  • Ensure agentic systems integrate cleanly with enterprise platforms (IAM, logging, monitoring, workflow engines, data platforms).
  • Partner with enterprise architecture to ensure interoperability across domains and prevent fragmentation.

5. Operational Readiness & Reliability

  • Design and enforce operational guardrails: monitoring, alerting, evaluation hooks, rollback plans, and safety kill-switches.
  • Establish runbooks for incident response, model/agent degradation, and dependency failures (tools, data sources, external APIs).
  • Define observability standards for agent traces, tool calls, prompts/responses, evaluation scores, and cost telemetry.
  • Lead postmortems and reliability improvements; ensure corrective actions are implemented and verified.

6. Technical Leadership & Enablement

  • Act as a principal technical leaderaligning cross-functional teams on architecture, roadmap, and delivery priorities.
  • Mentor engineers/architects on agentic design patterns, evaluation, and production hardening.
  • Drive reuse: shared components, gold-standard reference flows, and platform primitives that accelerate delivery.
  • Contribute to architecture councils/design reviews; influence standards and best practices across DAC Command.

Required Experience & Skills

Core Experience

  • 1015+ years in software/platform engineering with 5+ years in solution/AI/platform architecture roles.
  • Proven delivery of production-grade AI/LLM systems (not just prototypes), including operational ownership considerations.
  • Strong background in distributed systems, API/event-driven integration, and reliability engineering.

Agentic AI & LLM Runtime Expertise (Hands-On)

  • Deep experience with agentic patterns: multi-agent coordination, planning, tool calling, routing, memory, and state management.
  • Experience optimizing LLM inference: caching, batching, token/latency management, throughput tuning, and quality-cost trade-offs.
  • Strong understanding of evaluation strategies (offline/online), prompt/agent regression testing, and release gates.
  • Familiarity with common orchestration frameworks and patterns (e.g., graph-based agent flows, tool registries, function calling).

Platform & Operations

  • Strong cloud-native architecture experience (AWS/Azure/GCP), microservices, event streaming, and container/Kubernetes ecosystems.
  • Hands-on with observability stacks (logs/metrics/traces), SLO/error budgets, incident response practices, and postmortems.
  • Ability to design secure-by-default tool access patterns (least privilege, scoped tokens, auditability).

Soft Skills & Ways of Working

  • Production-first mindset: design for operability, safety, and reliability from day one.
  • Strong systems thinking: can reason across product, platform, data, security, and cost dimensions.
  • Clear communicator: able to explain architecture trade-offs to engineers, product, and executive stakeholders.
  • Bias for action: prototypes quickly, then codifies reusable standards and reference implementations.
  • Collaborative leadership: aligns teams without relying on formal authority.

Nice-to-Have / Preferred

  • Experience with large-scale workflow orchestration and automation platforms (BPM/workflow engines, event-driven pipelines).
  • Experience implementing agent observability and evaluation harnesses at scale.
  • Background in regulated environments (SOC2, HIPAA, PCI, CJIS) and designing AI systems with audit-ready traces.
  • Open-source contributions, talks, or published work in agentic systems, LLM infrastructure, or reliability engineering.

What Success Looks Like (First 1218 Months)

  • Agentic reference architectures and runtime standards are adopted across DAC Command deliveries.
  • Production deployments meet defined SLOs for latency, availability, and cost; incident rates reduce over time through reliability improvements.
  • Reusable orchestration primitives (routing, memory, tool registry, evaluation hooks) accelerate new use cases and reduce duplication.
  • Integration governance prevents fragmentationAPIs/events are versioned, compatible, and observable.
  • Teams trust the platform: safe rollouts, clear runbooks, and measurable quality/cost improvements are in place.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 141755365