Search by job, company or skills

ksa inc

C++Developer || AI || HPC || Bangalore

Save
new job description bg glownew job description bg glow
  • Posted an hour ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We are seeking an experienced C++ AI Inference Engineer to design, optimize, and deploy high-performance AI inference engines using modern C++ and processor-specific optimizations. You will collaborate with research teams to productionize cutting-edge AI model architectures for CPU-based inference.

Key Responsibilities

  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Collaborate with research teams to understand AI model architectures and requirements
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Design and implement AI model inference pipelines using C++17/20 and SIMD intrinsics (AVX2/AVX-512)
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Optimize cache hierarchy, NUMA-aware memory allocation, and matrix multiplication (GEMM) kernels
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Develop operator fusion techniques and CPU inference engines for production workloads
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Write production-grade, thread-safe C++ code with comprehensive unit testing
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Profile and debug performance using Linux tools (perf, VTune, flamegraphs)
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Conduct code reviews and ensure compliance with coding standards
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Stay current with HPC, OpenMP, and modern C++ best practices

Required Technical Skills

Core Requirements:

  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Modern C++ (C++17/20) with smart pointers, coroutines, and concepts
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0>

SIMD Intrinsics - AVX2 Required, AVX-512 Strongly Preferred

  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Cache optimization - L1/L2/L3 prefetching and locality awareness
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> NUMA-aware programming for multi-socket systems
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> GEMM/blocked matrix multiplication kernel implementation
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> OpenMP 5.0+ for parallel computing
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Linux performance profiling (perf, valgrind, sanitizers)

Strongly Desired

  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> High-performance AI inference engine development
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Operator fusion and kernel fusion techniques
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> HPC (High-Performance Computing) experience
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Memory management and allocation optimization

Qualifications

  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Bachelor's/Master's in Computer Science, Electrical Engineering, or related field
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> 3-7+ years proven C++ development experience
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Linux/Unix expertise with strong debugging skills
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Familiarity with Linear Algebra, numerical methods, and performance analysis
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Experience with multi-threading, concurrency, and memory management
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Strong problem-solving and analytical abilities

Preferred Qualifications

  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Knowledge of PyTorch/TensorFlow C++ backends
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Real-time systems or embedded systems background
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> ARM SVE, RISC-V vector extensions, or Intel ISPC experience

What You Will Work On

  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Production-grade AI inference libraries powering LLMs and vision models
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> CPU-optimized inference pipelines for sub-millisecond latency
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Cross-platform deployment across Intel Xeon, AMD EPYC, and ARM architectures
  • p]:pt-0 [&>p]:mb-2 [&>p]:my-0> Performance optimizations reducing inference costs by 3-5x

Skills: multithreading,simd,hpc,c++,high performance computing (hpc)

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147534487