Job Description:We are looking for a Lead Generative AI Engineer with 35 years of experience to spearhead development of cutting-edge AI systems involving Large Language Models (LLMs), Vision-Language Models (VLMs), and Computer Vision (CV). You will lead model development, fine-tuning, and optimization for text, image, and multi-modal use cases. This is a hands-on leadership role that requires a deep understanding of transformer architectures, generative model fine-tuning, prompt engineering, and deployment in production environments.
Roles and Responsibilities:- Lead the design, development, and fine-tuning of LLMs for tasks such as text generation, summarization, classification, Q&A, and dialogue systems.
- Develop and apply Vision-Language Models (VLMs) for tasks like image captioning, VQA, multi-modal retrieval, and grounding.
- Work on Computer Vision tasks including image generation, detection, segmentation, and manipulation using SOTA deep learning techniques.
- Leverage frameworks like Transformers, Diffusion Models, and CLIP to build and fine-tune multi-modal models.
- Fine-tune open-source LLMs and VLMs (e.g., LLaMA, Mistral, Gemma, Qwen, MiniGPT, Kosmos, etc.) using task-specific or domain-specific datasets.
- Design data pipelines, model training loops, and evaluation metrics for generative and multi-modal AI tasks.
- Optimize model performance for inference using techniques like quantization, LoRA, and efficient transformer variants.
- Collaborate cross-functionally with product, backend, and ML ops teams to ship models into production.
- Stay current with the latest research and incorporate emerging techniques into product pipelines.
Requirements:- Bachelors or Masters degree in Computer Science, Artificial Intelligence, Machine Learning, or related field.
- 35 years of hands-on experience in building, training, and deploying deep learning models, especially in LLM, VLM, and/or CV domains.
- Strong proficiency with Python, PyTorch (or TensorFlow), and libraries like Hugging Face Transformers, OpenCV, Datasets, LangChain, etc.
- Deep understanding of transformer architecture, self-attention mechanisms, tokenization, embedding, and diffusion models.
- Experience with LoRA, PEFT, RLHF, prompt tuning, and transfer learning techniques.
- Experience with multi-modal datasets and fine-tuning vision-language models (e.g., BLIP, Flamingo, MiniGPT, Kosmos, etc.).
- Familiarity with MLOps tools, containerization (Docker), and model deployment workflows (e.g., Triton Inference Server, TorchServe).
- Strong problem-solving, architectural thinking, and team mentorship skills.