Search by job, company or skills

intozi

AI Engineer Vision Language Models & Agentic Systems

Save
  • Posted 2 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

AI Engineer – Vision Language Models & Agentic Systems (2–3 Years Experience)

About the Role

We're looking for an AI Engineer with 2–3 years of hands-on experience in AI/ML, particularly in deploying and optimizing Vision Language Models (VLMs), Large Language Models (LLMs), or related generative AI systems.

You'll build and deploy vision-language systems that run efficiently across the full hardware spectrum—from constrained edge devices to multi-GPU servers. You'll own the model loading, optimization, and inference stack while building agentic AI systems that transform visual understanding into actionable intelligence.

Our deployments are fully self-hosted, often air-gapped, and run directly on bare metal—no cloud dependencies. This is a hands-on role for someone who enjoys working close to the hardware, optimizing GPU performance, and designing intelligent AI systems that solve real-world problems.

What You'll Do

  • Deploy and optimize Vision Language Models (VLMs) for production inference across edge accelerators, embedded GPUs, and datacenter-class GPUs.
  • Own the end-to-end model loading and inference pipeline, including quantization, memory budgeting, KV-cache management, and throughput/latency optimization.
  • Build AI agents that combine visual perception with tool use, retrieval, and structured reasoning to automate complex workflows.
  • Design, benchmark, and optimize inference-serving strategies, including batching, process isolation, threading, and independent CUDA contexts.
  • Deploy and maintain AI systems in fully offline, bare-metal, and air-gapped environments without relying on cloud services.
  • Port and optimize models across runtimes such as PyTorch/Transformers, vLLM, and ONNX Runtime with quantization-aware deployment strategies.
  • Profile and troubleshoot GPU-level performance issues, including VRAM utilization, CUDA kernels, precision tradeoffs, and runtime bottlenecks.
  • Integrate AI outputs with downstream systems such as databases, vector search, analytics platforms, and business workflows.

What We're Looking For

  • 2–3 years of hands-on experience building and deploying AI/ML systems, with practical experience working on VLMs, LLMs, or other generative AI models.
  • Strong Python programming skills and experience deploying AI models in production.
  • Solid understanding of GPU inference, including VRAM management, CUDA contexts, memory optimization, and performance tuning.
  • Hands-on experience with model quantization techniques such as FP8, INT8 (SmoothQuant/W8A8), and 4-bit weight-only quantization methods like AWQ.
  • Experience deploying models across a range of hardware, from resource-constrained edge devices to enterprise GPU servers.
  • Experience working in bare-metal, on-premise, or air-gapped environments without cloud-managed infrastructure.
  • Familiarity with inference frameworks such as Hugging Face Transformers, vLLM, and ONNX Runtime.
  • Understanding of VLM-specific challenges, including vision encoder activations, image-token KV-cache growth and mixed-precision inference.
  • Experience building AI agents with tool calling, orchestration, retrieval, and multi-step reasoning.
  • Strong debugging and problem-solving skills with a focus on production-quality AI systems.

Nice to Have

  • Experience deploying AI workloads on edge devices, including INT8 calibration and model compilation pipelines.
  • Experience with computer vision pipelines such as YOLO, custom pre/post-processing, and NMS.
  • Familiarity with vector databases and hybrid semantic + structured search.
  • Strong benchmarking and profiling discipline with a focus on optimizing real production workloads.
  • Experience working with SQL and analytics-backed systems.
  • Contributions to open-source AI projects or experience building reusable AI frameworks and tooling.

Mail ID: [Confidential Information]

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 150641255