Computer Vision & Multimodal LLM
Intern (Drawing Change Analysis Agent)About Doaz
Doaz turns fragmented industrial knowledge into instant, actionable insight. We build LLM- and Vision-AI solutions for construction, heavy industry, and finance—helping teams convert drawings, specs, and regulations into real-time decisions. We're expanding our GeoAI programs (incl. joint work with POSCO E&C) and launching drawing-change detection services that compare plan versions, detect deltas, and explain design impacts.
Why You'll Love Working Here
- Ship real things: Your models and tools can reach production pilots in weeks.
- Mentorship, not bureaucracy: Learn directly from senior CV/LLM engineers and domain SMEs.
- Global crew: 30 teammates across KR / PK / IN ; English-first collaboration.
- Tech playground: YOLO/RT-DETR, Gemma-VL/Qwen-VL/LLaVA, PaddleOCR, LayoutLMv3, Triton—hands-on.
Role Overview
As a CV & Multimodal LLM Intern, you'll support the end-to-end development of a version-aware drawing-diff engine (PDF/DWG raster & vector), symbol/text extraction, and change-impact narratives powered by RAG/LLM. You'll prototype, evaluate, and iterate with fast feedback from real engineering users.
What You'll Do (Intern Scope)
- Drawing Change Analysis (CV): assist in rasterization, layer parsing, vector geometry ops; train/evaluate detectors (YOLOv8/RT-DETR/SAM); implement geometry-aware post-processing (IoU/topology/snapping).
- Document & Layout Understanding: combine OCR (PaddleOCR/Tesseract) with layout models (DocFormer/LayoutLMv3/Donut); normalize to structured JSON; help with version-aware entity tracking (gridlines, BH IDs, coordinates).
- GeoAI & LLM/RAG: set up retrieval (BM25 + vector with reranking); ground LLM answers with citations and clickable evidence; draft change-impact summaries with rule prompts + LLM verification.
- Productization Basics: package prototypes as FastAPI services or notebooks; write READMEs; contribute datasets, labeling guides, and simple A/B or ablation tests.
Minimum Qualifications
- BS/MS student or recent graduate in CS/EE/CE/Geoinformatics/Civil (or similar).
- Solid Python (3.x); foundations in DS/algorithms, linear algebra, probability.
- Coursework/projects in CV and/or document AI (detection, segmentation, OCR, layout).
- Familiar with PyTorch or TensorFlow; Git, Linux, Jupyter.
- Clear written English; high learning velocity and ownership.
Nice to Have
- Hands-on with YOLO/RT-DETR/Detectron2/SAM; PaddleOCR/Tesseract; LayoutLMv3/Donut.
- Exposure to VLMs (Gemma-VL, Qwen-VL, LLaVA), CLIP, rerankers.
- Experience with engineering drawings/CAD/PDF toolchains.
- Basic FastAPI, Docker, ONNX/TensorRT/Triton.
- Frontend (TypeScript/React) for quick review UIs.
Internship Details & Benefits
- Type/Duration: Paid internship — 4 months (full-time preferred).
- Compensation (India): Stipend prorated from 6 LPA (INR 600,000 annualized), paid monthly ( INR 50,000/month during the internship).
- For candidates outside India, compensation will be benchmarked to local market equivalents.
- Conversion: High performers will receive a full-time offer upon successful completion of the 4-month internship.
- Perks: Mentorship, cloud/GPU credits, real production impact.
Hiring Process (fast)
- Intro call (15–20 min).
- 48-hour mini task: simple drawing diff or OCR/layout extraction + short README (clarity > polish).
- Tech chat (45–60 min): approach, trade-offs, evaluation.
- Founder chat on culture & goals.
- Offer.
How to Apply
Email [Confidential Information]
with subject [CV/LLM Intern – Your Name] and include:
- Résumé/CV (highlight courses/projects; metrics if available).
- GitHub or demo links (CV/doc-AI/RAG preferred).
- Availability (start date, weekly hours).
- (Optional) A one-page diagram of your Drawing Revision Detection Evidence LLM Narrative pipeline.
- Ready to learn fast and turn messy drawings into trusted intelligence Join Doaz and build with us.