Opening For Edge AI Architect

10-16 Years

Save

Early Applicant

Quick Apply

Job Description

Responsibilities:

Agentic Architecture & Interoperability

Define end-to-end Edge AI system architecture covering data acquisition, preprocessing, model execution, orchestration, and edgecloud integration.
Evaluate and select hardware accelerators (GPU, NPU, DSP, TPU, VPU) based on workload characteristics and performance requirements.
Architect solutions using platforms such as NVIDIA Jetson, Intel OpenVINO, Qualcomm AI Engine, ARM Ethos, and Edge TPU.
Design real-time model pipelines for vision, audio, signal processing, and sensor fusion workloads.
Implement decentralized multi-agent systems using agentic frameworks and graph-based orchestration.
Design Agent-to-Agent (A2A) communication protocols for interoperability across heterogeneous environments.
Integrate Model Context Protocol (MCP) servers to securely enable agents to access enterprise data, tools, and services.

Generative AI & Small Language Model Customization

Lead selection and customization of Small Language Models (SLMs) for domain-specific use cases.
Apply parameter-efficient fine-tuning techniques such as LoRA and QLoRA to optimize compute efficiency.
Adapt models for on-device intelligence and enterprise-grade agentic workflows.

Edge AI & Inference Optimization

Optimize AI models for deployment on resource-constrained devices including smartphones, smart glasses, wearables, IoT gateways, and embedded Linux systems.
Implement Post-Training Quantization (PTQ), Quantization-Aware Training (QAT), pruning, and sparsity techniques.
Optimize inference using TFLite, PyTorch Mobile, and ONNX Runtime with hardware acceleration support (NPU/DSP).
Perform advanced optimizations including INT8/INT4 quantization, mixed precision, KV-cache optimization, speculative decoding, batch and streaming inference tuning.
Profile and optimize inference pipelines across CPU, GPU, NPU, and DSP to reduce cold-start latency and enhance real-time responsiveness.

Embedded & System Integration

Develop high-performance inference engines and middleware in C/C++ to interface AI models with sensors and actuators.
Build Android-native AI services using Java/Kotlin and Android NDK with optimized background execution and battery efficiency.
Ensure seamless integration between AI workloads and embedded hardware platforms.