Search by job, company or skills

  • Posted 8 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

SDE 2 / SDE 3 AI Infrastructure & LLM Systems Engineer

Location: Pune / Bangalore (India)

Experience: 48 years

Compensation: no bar for the right candidate

Bonus: Up to 10% of base

About The Company

AbleCredit builds production-grade AI systems for BFSI enterprises, reducing OPEX by up to 70% across onboarding, credit, collections, and claims.

We run our own LLMs on GPUs, operate high-concurrency inference systems, and build AI workflows that must scale reliably under real enterprise traffic.

Role Summary (What We're Really Hiring For)

We are looking for a strong backend / systems engineer who can:

  • Deploy AI models on GPUs
  • Expose them via APIs
  • Scale inference under high parallel load using async systems and queues

This is not a prompt-engineering or UI-AI role.

Core Responsibilities

  • Deploy and operate LLMs on GPU infrastructure (cloud or on-prem).
  • Run inference servers such as vLLM / TGI / SGLang / Triton or equivalents.
  • Build FastAPI / gRPC APIs on top of AI models.
  • Design async, queue-based execution for AI workflows (fan-out, retries, backpressure).
  • Plan and reason about capacity & scaling:
  • GPU count vs RPS
  • batching vs latency
  • cost vs throughput
  • Add observability around latency, GPU usage, queue depth, failures.
  • Work closely with AI researchers to productionize models safely.

Must-Have Skills

  • Strong backend engineering fundamentals (distributed systems, async workflows).
  • Hands-on experience running GPU workloads in production.
  • Proficiency in Python (Golang acceptable).
  • Experience with Docker + Kubernetes (or equivalent).
  • Practical knowledge of queues / workers (Redis, Kafka, SQS, Celery, Temporal, etc.).
  • Ability to reason quantitatively about performance, reliability, and cost.

Strong Signals (Recruiter Screening Clues)

Look For Candidates Who Have

  • Personally deployed models on GPUs
  • Debugged GPU memory / latency / throughput issues
  • Scaled compute-heavy backends under load
  • Designed async systems instead of blocking APIs

Nice to Have

  • Familiarity with LangChain / LlamaIndex (as infra layers, not just usage).
  • Experience with vector DBs (Qdrant, Pinecone, Weaviate).
  • Prior work on multi-tenant enterprise systems.

Not a Fit If

  • Only experience is calling OpenAI / Anthropic APIs.
  • Primarily a prompt engineer or frontend-focused AI dev.
  • No hands-on ownership of infra, scaling, or production reliability.

Skills:- Large Language Models (LLM), LLMops, Generative AI and Large Language Models (LLM) tuning

More Info

Job Type:
Industry:
Employment Type:

Job ID: 144936401