Search by job, company or skills

quantalent ai

Senior Machine Learning Engineer

Save
  • Posted 23 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

This role is with one of our client, which is an AI-Powered Revenue Cycle Intelligence Platform, transforming the healthcare billing stack with autonomous medical coding, proactive denial prevention, and workflow automation solutions.

Location: Bengaluru

Experience: 2-7yrs

Role:ML Engineer / Senior AI/ML Engineer

Mode: Work from Office

Key requirement:Candidate must be only from IIT

Job Overview

We are hiring a ML Engineer / Senior AI/ML Engineer to own the end-to end applied LLM, retrieval, and evaluation layer of our healthcare AI platform. You will build production systems that automate mid- and end revenue cycle workflows for US healthcare spanning coding, claim edits, denials triage, appeal generation, and payer-rule reasoning. This is a production engineering role (not research) focused on building scalable, auditable, and cost-efficient LLM systems in a regulated healthcare environment.

What You'll Own

1. Self-Hosted LLM Infrastructure

• Deploy, fine-tune, and operate open-source models (Llama, Qwen, MedGemma, and successors) as our primary inference stack

• Work with vLLM / SGLang / TensorRT-LLM for serving at scale, with disciplined attention to throughput, tail latency, batching, KV-cache, and GPU economics

• Own fine-tuning workflows end-to-end (SFT, LoRA, QLoRA, DPO) on clinical notes, claims, and payer-rule data

• Optimize GPU usage, latency, batching, and cost; make build-vs-buy and hosted-vs-self-hosted trade-offs explicit and measured

2. Knowledge Graphs & Embedding-Based Retrieval

• Design and maintain the knowledge graph encoding ICD-10-CM, CPT, HCPCS, modifiers, HCC, NCCI edits, LCD/NCD policies, and payer specific rules — and the relationships between them

• Build embedding-based retrieval over clinical notes, historical claims, denial reasons, and payer-policy corpora — including chunking, embedding model selection, hybrid search, and reranking

• Combine graph traversal and dense retrieval so every coded line, scrubbed edit, and appeal response is grounded in auditable evidence

• Own ingestion, versioning, and quality of underlying knowledge sources (CMS, AHA, AMA, NCCI, payer bulletins)

3. Evaluation & Monitoring

• Build continuous evaluation pipelines that gate every model, prompt, retrieval, and graph change before production

• Run offline eval suites grounded in coder- and biller-validated labels; use LLM-as-judge where appropriate, calibrated against human ground truth

• Monitor drift, hallucinations, regressions, and output quality in production; operate shadow-mode rollouts and per-cohort accuracy tracking (specialty, payer, chart type)

• Track business metrics: chart-level and opportunity-level coding accuracy, denial rate impact, clean-claim rate, cost per chart, and end-to-end latency

4. LLM Systems & Prompt Engineering

• Design prompts and context pipelines for coding (CPT, ICD, HCC, E/M), claim edits, denial classification, and appeal drafting

• Implement structured outputs (JSON, function calling, constrained decoding) on top of the self-hosted stack

• Apply RAG over medical coding standards (CMS, ICD-10, AHA, NCCI) and payer policies, grounded in the knowledge graph and embedding stores

• Treat prompts as a thin, well-versioned, well-evaluated layer — never the load-bearing piece

5. Agentic Workflows & Tooling — MCP

• Build MCP servers for internal tools: code lookup, NCCI / rule checks, payer logic, eligibility, denial classification

• Design multi-step agent workflows with audit trails and human-in-the-loop checkpoints for coder, biller, and AR-analyst review

• Define deterministic vs. LLM-based tool boundaries for reliability — reliability comes from knowing which is which

What We're Looking For

Must-Have

• 5+ years in ML/AI engineering, including 6+ months in production LLM systems

• Hands-on experience deploying and operating self-hosted LLMs (vLLM, SGLang, TensorRT-LLM, or equivalent)

• Strong experience designing embedding-based retrieval and/or knowledge graphs for grounded LLM applications

• Demonstrated ownership of evaluation infrastructure — offline benchmarks, online monitoring, drift and regression detection

• Strong Python + PyTorch + Hugging Face experience

• Production experience with monitoring, incidents, and system ownership

Strongly Preferred

• Fine-tuning experience (SFT, LoRA, QLoRA, DPO) on domain-specific corpora

• Experience with graph databases (Neo4j, ArangoDB, or equivalent) and graph-aware retrieval

• Experience with vector databases and hybrid search (BM25 + dense, rerankers)

• Familiarity with LLM observability tools (Langfuse, LangSmith, Arize, Braintrust, or in-house equivalents)

• Exposure to healthcare, RCM, claims, or other regulated domains

• Experience with MCP or similar tool-orchestration frameworks

• Strong prompt-engineering and LLM-evaluation instincts

What We Offer

• Work on high-impact healthcare AI systems used in real billing and RCM workflows

• Ownership of production LLM, retrieval, and evaluation systems end-to-end

• Solve real-world problems with real constraints (cost, latency, compliance, auditability)

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 148879423

Similar Jobs

Bengaluru, India

Skills:

Open CvTensorflowPytorchMLopsPythonAWSLangchainedge computingHuggingFaceGovector databasesRAG based applicationsLLM architectures

Bengaluru, India

Skills:

PytorchPythonexperiment trackingDeepSpeedbenchmarking systemsevaluation frameworksFSDPreproducibility practicesCompressionmodel optimization

Bengaluru, India

Skills:

snowflake GithubCursorCodeDeep LearningTensorflowPytorchMLopsSparkGitlabAzurePythonAWSCrewAILangChainLLMOpsCodexClaudeAI Cloud architecturesAutogenAgentic AIAgentic Coding Frameworks

Bengaluru, India

Skills:

SqlCudaTensorflowNumpyPytorchPandasDockerSparkKubernetesPythonTensorRTMLflowRayScikit-LearnONNXKubeflow

Bengaluru, India

Skills:

ScipyNltkSklearnTensorflowPandasGcpMLopsNumpyMatplotlibECSAzureKubernetesPythonAWSLLMsHugging FaceOpenRouterTorchBERTSpacyModal