Search by job, company or skills

  • Posted 3 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Company Description:

Bhatiyani Astute Intelligence specializes in creating state-of-the-art computer vision solutions to address challenges in warehouse management, industrial monitoring, robotics, and automation. Leveraging artificial intelligence, we develop innovative 3D modeling and AI-powered systems that optimize operational efficiency, enhance security, and improve quality control for diverse industries. Our commitment to technological excellence helps businesses streamline operations and stay ahead in competitive markets.

Role Overview:

We are looking for a passionate Computer Vision Engineer with hands-on experience in Generative AI and Vision-Language Models (VLMs) to build intelligent visual understanding systems. This role requires a blend of strong engineering skills and creative problem solving to develop real-time video analytics solutions, deploy AI models on edge devices, and design multimodal AI systems that combine vision and language reasoning.

You will work on practical industry problems such as surveillance intelligence, warehouse automation, quality inspection, smart monitoring, and AI assistants for video understanding.

Key Responsibilities:

Design and develop real-time computer vision pipelines for video analytics applications.

Build solutions using Vision-Language Models (VLMs) for:

Visual question answering

Image/video captioning

Multimodal search and reasoning

Automated insights from CCTV/video streams.

Develop GenAI-driven features such as anomaly explanation, report generation, and conversational AI over video.

Deploy and optimize models on edge hardware (NVIDIA Jetson, edge GPUs, on-prem servers).

Implement low-latency inference using TensorRT, ONNX, OpenVINO or equivalent.

Work with multi-camera streaming pipelines using RTSP/ONVIF/WebRTC/GStreamer/DeepStream.

Optimize models for FPS, memory footprint, and power efficiency.

Prepare datasets, annotation strategies, evaluation metrics, and model monitoring.

Collaborate with backend teams to expose AI via APIs, microservices, and dashboards.

Required Skills & Qualifications:

Core Computer Vision:

24 years of hands-on experience in computer vision & deep learning.

Strong understanding of:

Object detection, tracking, segmentation

Activity recognition

Face recognition / ANPR / counting / anomaly detection

Experience building real-time video analytics systems.

GenAI & VLM (Mandatory):

Hands-on exposure to Vision-Language Models (e.g., CLIP, BLIP, LLaVA, GPT-4V style workflows).

Experience in:

Prompt engineering for vision models

Multimodal retrieval

Fine-tuning using LoRA/PEFT

Ability to combine LLM + CV for practical workflows.

Deployment & Engineering:

Experience deploying models on edge devices.

Model optimization: quantization, pruning, TensorRT.

Familiar with:

OpenCV, PyTorch/TensorFlow

Docker, FastAPI/Flask

Multi-camera architecture.

Programming:

Strong Python skills; C++ is a plus.

Experience with REST APIs and system integration.

Creativity & Problem-Solving Expectations:

We are not looking for people who only run pre-built notebooks.

The candidate must:

Demonstrate strong creativity and out-of-the-box thinking to solve vision problems where standard approaches fail.

Be able to design novel pipelines combining VLMs, GenAI, and classical CV.

Experiment with unconventional prompt strategies and multimodal reasoning.

Convert ambiguous business requirements into technical solutions without hand-holding.

Show curiosity to explore latest research papers and adapt them for production.

Strong Plus Points:

Published research papers, technical blogs, or patents in:

Computer Vision

Vision-Language Models

Generative AI

Edge AI

Open-source contributions or Kaggle participation.

Experience reproducing research papers and converting them into deployable systems.

Side projects demonstrating original thinking rather than tutorials.

Good to Have

Multi-camera tracking & re-identification

Experience with DeepStream / Triton / MLOps

Synthetic data generation

3D vision / point clouds

Domain experience in surveillance, warehouse, manufacturing, smart city.

What You Will Build:

AI systems that understand video + language together.

Edge analytics running at 2060 FPS.

GenAI assistants for CCTV and industrial automation.

Scalable pipelines handling dozens of cameras.

Why Join Us:

Work at the intersection of Computer Vision + GenAI + Edge AI.

High ownership and freedom to experiment.

Fully remote with flexible work culture.

Opportunity to build real products, not POCs.

Education:

B.Tech / M.Tech in CS, AI, ECE or related field (or equivalent practical experience).

Compensation:

6,00,000 10,00,000 per annum (WFH Role)

Performance bonuses based on delivery.

Laptop & internet allowance.

Medical and accidental insurance.

How to Apply:

Share your resume with:

Please include:

GitHub / portfolio (we will check)

Projects on VLM, real-time CV, or edge deployment

Any research papers / blogs if available.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 145081987

Similar Jobs