Responsibilities:
Agentic Architecture & Interoperability
- Define end-to-end Edge AI system architecture covering data acquisition, preprocessing, model execution, orchestration, and edgecloud integration.
- Evaluate and select hardware accelerators (GPU, NPU, DSP, TPU, VPU) based on workload characteristics and performance requirements.
- Architect solutions using platforms such as NVIDIA Jetson, Intel OpenVINO, Qualcomm AI Engine, ARM Ethos, and Edge TPU.
- Design real-time model pipelines for vision, audio, signal processing, and sensor fusion workloads.
- Implement decentralized multi-agent systems using agentic frameworks and graph-based orchestration.
- Design Agent-to-Agent (A2A) communication protocols for interoperability across heterogeneous environments.
- Integrate Model Context Protocol (MCP) servers to securely enable agents to access enterprise data, tools, and services.
Generative AI & Small Language Model Customization
- Lead selection and customization of Small Language Models (SLMs) for domain-specific use cases.
- Apply parameter-efficient fine-tuning techniques such as LoRA and QLoRA to optimize compute efficiency.
- Adapt models for on-device intelligence and enterprise-grade agentic workflows.
Edge AI & Inference Optimization
- Optimize AI models for deployment on resource-constrained devices including smartphones, smart glasses, wearables, IoT gateways, and embedded Linux systems.
- Implement Post-Training Quantization (PTQ), Quantization-Aware Training (QAT), pruning, and sparsity techniques.
- Optimize inference using TFLite, PyTorch Mobile, and ONNX Runtime with hardware acceleration support (NPU/DSP).
- Perform advanced optimizations including INT8/INT4 quantization, mixed precision, KV-cache optimization, speculative decoding, batch and streaming inference tuning.
- Profile and optimize inference pipelines across CPU, GPU, NPU, and DSP to reduce cold-start latency and enhance real-time responsiveness.
Embedded & System Integration
- Develop high-performance inference engines and middleware in C/C++ to interface AI models with sensors and actuators.
- Build Android-native AI services using Java/Kotlin and Android NDK with optimized background execution and battery efficiency.
- Ensure seamless integration between AI workloads and embedded hardware platforms.