
Search by job, company or skills
Responsibilities :
Deep Learning Model Conversion: Convert and adapt deep learning network architectures (e.g., from PyTorch) for deployment on various embedded platforms.
Quantization-Aware Training (QAT): Implement and fine-tune Quantization-Aware Training techniques to optimize model performance and reduce memory footprint while maintaining accuracy.
Model Optimization: Perform extensive model optimization techniques, including pruning, quantization (post-training and QAT), and network architecture search, to achieve desired latency, power, and memory targets.
Runtime Integration: Integrate optimized deep learning models with embedded runtime environments and hardware accelerators.
Performance Profiling & Tuning: Analyze and profile model performance on target embedded hardware, identifying bottlenecks and implementing solutions for real-time inference.
Number Format Conversion: Work with various number formats (e.g., FP32, FP16, INT8) and develop strategies for efficient conversion and utilization on embedded processors.
Toolchain Development & Utilization: Utilize and contribute to the development of custom conversion tools and optimization scripts to streamline the deployment pipeline.
Skills Required :
Experience: 5+ years of experience in embedded software development with a strong focus on AI/Machine Learning deployment.
Programming Skills: Proficient in Python for AI development and scripting.
Deep Learning Frameworks: Hands-on experience with deep learning frameworks such as PyTorch. Experience with TensorFlow/Keras is a plus.
Embedded Systems: Strong understanding of embedded system architectures, microcontrollers, DSPs, and/or FPGAs.
Optimization Techniques: Proven experience with deep learning model optimization techniques (quantization, pruning, knowledge distillation).
Number Formats: Familiarity with different number formats (e.g., FP32, FP16, INT8) and their implications for embedded inference.
Conversion Tools: Experience with model conversion tools (e.g., ONNX, OpenVINO, TensorRT, TVM).
Problem-Solving: Excellent analytical and problem-solving skills, with a strong ability to debug and optimize complex systems.
Experience with C/C++ for embedded development.
Familiarity with hardware acceleration (e.g., NPUs, GPUs on edge devices).
Job ID: 130411605