Lead the design and deployment of enterprise-grade generative AI systems, driving innovation in LLM orchestration, multimodal architectures, and scalable AI/ML pipelines. Own the full lifecycle from research to production, ensuring alignment with business objectives and ethical AI standards. This will be a hands-on individual contributor role as well as providing technical guidance to junior developers.
Key Responsibilities
- Technical Leadership
- Architect multi-LLM systems (e.g., Mixture-of-Experts, LLM routing) for cost-performance optimization.
- Design GPU/TPU-optimized training pipelines (FSDP, DeepSpeed) for billion-parameter models.
- Cloud-Native AI Development
- Build multi-cloud GenAI platforms (Azure OpenAI + GCP Vertex AI + AWS Bedrock) with unified MLOps.
- Implement enterprise security: VPC peering, private model endpoints, and data residency compliance.
- Innovation & Strategy
- Pioneer GenAI use cases: Agentic workflows, AI-driven synthetic data generation, real-time fine-tuning.
- Establish AI governance frameworks: Model cards, drift monitoring, and red-teaming protocols.
- Cross-Functional Impact
- Partner with leadership to define AI roadmaps and ROI metrics (e.g., $ saved via AI-driven automation).
- Mentor junior engineers and evangelize GenAI best practices across the organization.
Qualifications
- Education: Bachelors/Masters in CS/AI or equivalent industry experience (5+ years in ML, 2+ in GenAI).
- Technical Mastery:
- Languages: Python.
- Frameworks: Expert-level PyTorch, TensorFlow Extended (TFX), ONNX Runtime.
- Cloud: Certified in Azure AI Engineer Expert and/or GCP Professional ML Engineer.
- GenAI Expertise:
- Shipped production GenAI systems (e.g., 10k+ QPS chatbots, code autocomplete at GitHub Copilot scale).
- Advanced prompt/response engineering: Self-critique chains, LLM cascades, guardrail-driven generation.
Must-Have Experience
- Cloud AI experience:
- Azure: Designed solutions with Azure OpenAI, MLOps Pipelines, and Cognitive Search.
- GCP: Scaled Vertex AI LLM Evaluation, Gemini Multimodal, and TPU v5 Pods.
- High-Impact Projects:
- Automation projects to reduce significant $$ costs.
- Built RAGsystems with hybrid search (vector + lexical) and dynamic data hydration.
- Led AI compliance for regulated industries (healthcare, finance).
Preferred Qualifications Additions
- Certifications:
- Azure: Microsoft Certified: Azure AI Engineer Associate.
- GCP: Google Cloud Professional Machine Learning Engineer.
- Experience with hybrid/multi-cloud GenAI deployments (e.g., training on GCP TPUs, serving via Azure endpoints).