Search by job, company or skills

W

Principal Engineer - Generative AI Infra Capabilities

new job description bg glownew job description bg glownew job description bg svg
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About This Role

Wells Fargo is seeking a Principal Engineer - Generative Gen AI GPU Infrastructure Capabilities.

In This Role, You Will

  • Act as an advisor to leadership to develop or influence applications, network, information security, database, operating systems, or web technologies for highly complex business and technical needs across multiple groups
  • Lead the strategy and resolution of highly complex and unique challenges requiring in-depth evaluation across multiple areas or the enterprise, delivering solutions that are long-term, large-scale and require vision, creativity, innovation, advanced analytical and inductive thinking
  • Translate advanced technology experience, an in-depth knowledge of the organizations tactical and strategic business objectives, the enterprise technological environment, the organization structure, and strategic technological opportunities and requirements into technical engineering solutions
  • Provide vision, direction and expertise to leadership on implementing innovative and significant business solutions
  • Maintain knowledge of industry best practices and new technologies and recommends innovations that enhance operations or provide a competitive advantage to the organization
  • Strategically engage with all levels of professionals and managers across the enterprise and serve as an expert advisor to leadership

Required Qualifications

  • 7+ years of Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education

Desired Qualifications

  • Design GPU cluster topologies (H100/H200, NVLink/NVSwitch), networking, and storage paths for highthroughput inferencing; document sizing and perf baselines.
  • Implement Run: AI constructs (Collections/Departments/Projects/workloads) for MDEV/MDEP/UCEP/MRM; codify quota, priority, and fairshare policies.
  • POC & benchmark disaggregated inferencing (prefill/decode) with vLLM/TensorRTLLM; publish guidance for H100/H200 tuning (FP8/INT8/AWQ) and KVtransfer behavior over NVLink.
  • Operationalize OpenShift AI parity for GPU scheduling, time slicing/MIG profiles, and preemption; validate upgrade paths and helm/kustomize packaging.
  • Integrate Triton Inference Server for multimodel serving; standardize model repository structure, batching, dynamic shapes, and telemetry hooks. (Supported broadly by platform docs; add repo specifics when you share them.)
  • Harden NGDC environments with AVI/GSLB patterns (Prod1/Prod2) and BCP; execute DR failover runbooks and steadystate capacity planning.
  • Publish steadystate runbooks (deploy certify promote): DEV UAT MDEPBeta MDEPGA / UCEP; define promotion criteria and risk exceptions.
  • Own endpoint product ionization via Apigee (AI Gateway)authN/Z, rate limiting, API SLAs, versioning/deprecation and SDK generation for internal consumers.
  • Embed observability/evaluations with Overwatch + Arize: prompt/agent/tool tracing, SLO dashboards, alerting, and dataretention/export workflows.
  • Automate CI/CD for infra and model artifacts: image scanning (JFrog remote repo), chart releases, canaries, and rollback plans across OCP/GKE.
  • Tune CUDA kernels/graph execution paths; profile NCCL collectives; resolve performance bottlenecks (HBM bandwidth, kernel fusion, p2p comms). (NCCL inferred per assumption.)
  • Qualify LLM/SLM runtimes (Gemma, Llama, GPTOSS, etc.) with Run: AI scheduling; publish permodel recipes for throughput, latency, cost and stability.
  • Define GPU estate hygiene: image provenance, secrets handling, namespace/network policy baselines, and change controls for upgrades (e.g., Run: AI v2.21+).
  • Partner with product/TPM/PO to align backlog to platform milestones (OpenShift AI goforward, SuperPOD activation waves, endpoint rollouts).
  • Mentor engineers; lead deepdive reviews and present in exec/tech forums (CIO/ARB/offsites) with architecture readouts, performance data, and risk mitigations.
  • NVIDIA & CUDA: CUDA/cuDNN usage, NVLink/NVSwitch understanding, MIG setup, NCCL tuning, GPU profiling, H100/H200 optimization. Optimize kernels and collectives, choose MIG profiles, validate interconnect bandwidth and NUMA/PCIe topology for LLM/SLM workloads.
  • LLM/SLM Runtimes: Work with vLLM, TensorRTLLM, Triton; apply FP8/INT4 quantization; tune KVcache strategies. Build POCs for disaggregated prefill/decode, standardize Triton repos, and optimize batching.
  • Orchestration: Use Run: AI structures (Collections/Departments/Projects), manage OCP/GKE environments. Implement GPU allocation patterns, enforce quotas, preemption, fairshare scheduling.
  • OpenShift AI: Configure RHOAI GPU scheduling and time slicing, use helm/kustomize, validate upgrades. Achieve platform parity, certify charts and policies, ensure admission controls function reliably.
  • API & Gateway: Apply Apigee authN/Z, manage quotas, rate limits, OpenAPI specs, SDK generation, SLA operations. Productionize model endpoints, manage versioning and deprecation, enforce gatewaylevel SLAs
  • Observability & Evaluation: Use Overwatch + Arize for tracing and evals, define SLOs, alerts, retention/export processes. Trace prompts/tools/agents, enforce data retention, publish standardized dashboards.
  • CI/CD & Artifacts: Manage JFrog repos, image scanning, helm releases, canary/rollback workflows. Standardize artifact flow, automate safe promotions, ensure compliant releases.
  • Performance Engineering: Model throughput/latency, optimize token/sec, batch shaping, cache policies. Produce permodel performance recipes, tune cost/performance tradeoffs for LLM/SLM.
  • Controls & SDLC: Apply JAD lite practices, manage change controls, secrets hygiene, namespace/network policies. Maintain compliance across GPU estate, ensure full auditability and proper access boundaries.
  • Communication: Create executivefriendly narratives, write architectures and runbooks, present in forums. Deliver content in offsites/CIO forums, publish clear decision memos.

Reference Number

R-516638

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 144355089