Role Description
We are hiring a Senior Computer Vision Research Engineer to design and deploy scalable, low-latency video analytics systems for large-scale CCTV networks. Core focus: building the best-in-class Vision-Language Models (VLMs) optimized for edge deployment, enabling multimodal reasoning (VQA, semantic search, event description) in resource-constrained environments.
Key Responsibilities:
- Architect end-to-end pipelines: MOT, Re-ID, action/anomaly detection, scene understanding.
- Develop and optimize sub-2B parameter VLMs for edge (e.g., surpassing Moondream2/Qwen2-VL benchmarks) using QAT, PTQ, pruning, distillation, and efficient architectures.
- Scale real-time processing of thousands of streams with sub-second latency.
- Profile and resolve bottlenecks in video analytics and multimodal systems.
- Optimize for edge hardware (Jetson, Coral, Hailo) via TensorRT/OpenVINO/TVM.
- Design hybrid cloud-edge architectures and production monitoring.
Qualifications :
- Minimum 3+ years of industry experience in developing and deploying computer vision systems for video analytics at scale.
- Proven track record of production deployments across large-scale camera networks ,including full lifecycle from prototyping to monitoring.
- Demonstrated expertise in building and optimizing Vision-Language Models (VLMs) for edge environments, with hands-on experience in architectures like unified embedding, cross-modality attention, or efficient variants (e.g., SmolVLM, LFM2-VL, MobileVLM).
- Deep understanding of performance bottlenecks in contemporary video analytics and VLM systems (e.g., GPU/CPU saturation, PCIe bandwidth contention, codec latency, drift due to domain shift, high token counts in multimodal processing, and privacy-preserving inference).
- Hands-on expertise in edge model optimization using TensorFlow Lite, ONNX Runtime, PyTorch Mobile, OpenVINO, TensorRT, or TVM, achieving 25x reductions in latency/memory while maintaining accuracy, including techniques for VLM compression like token pruning or multi-scale pooling.
- Strong proficiency in Python/C++, with extensive experience in PyTorch/TensorFlow, OpenCV, CUDA, and distributed training/inference frameworks.
- Solid foundation in modern CV architectures (Transformers, CNNs, hybrid models), real-time tracking algorithms (DeepSORT, ByteTrack, BoT-SORT), and VLM components (e.g., vision encoders like ViT, multimodal pre-training strategies).
What we offer:
- Competitive compensation package with equity.
- Comprehensive health benefits and flexible working arrangements.
- Access to cutting-edge hardware, cloud credits, and conference attendance support.
- Opportunity to shape the future of AI-powered physical security systems.