
Search by job, company or skills
We're looking for a highly experienced Lead Architect to own the end-to-end design and governance of a large-scale, fully on-premises AI and distributed systems platform within a critical infrastructure environment. This role blends deep technical expertise in GPU infrastructure, Kubernetes, and AI/ML systems with domain knowledge in energy systems and operational technology (OT).
You will drive architecture decisions across compute, data, and integration layers while ensuring compliance with stringent regulatory and security requirements.
Mandatory Skills
Distributed systems architecture — 100% on-premises large-scale deployments
Kubernetes (self-managed on-prem with NVIDIA GPU Operator; on-prem ops, NOT just cloud-managed K8s)
AI/ML system design (LLM, RAG, NVIDIA NeMo Agent Toolkit agentic frameworks)
Energy domain OR critical infrastructure background (SCADA, EMS, grid ops)
API design (REST, gRPC, event-driven / Kafka)
NERC CIP / OT-IT security fundamentals
Job ID: 146565221