Experience: 5.00 + years
Salary: Confidential (based on experience)
Expected Notice Period: 30 Days
Shift: (GMT+05:30) Asia/Kolkata (IST)
Opportunity Type: Hybrid ()
Placement Type: Full Time Permanent position(Payroll and Compliance to be managed by: Confido Health)
(*Note: This is a requirement for one of Uplers client - Confido Health)
What do you need for this opportunity
Must have skills required:
Conversation AI, ETL, Voiceai, ( MongoDB OR Postgres), LLM
Confido Health is Looking for:
Job Title: Data Lead
Location: NYC / Bangalore
Reports to: Chief Product Officer
About Confido Health
Confido Health is transforming healthcare administration through AI-powered automation. We deploy intelligent voice agents that handle front-office and back-office workflows for specialty medical practices — from patient scheduling to revenue cycle tasks. Raised $13M, serving 1000 sites in dental, ophthalmology, dermatology, and physical therapy. Weʼre scaling fast and redefining how healthcare communicates and operates.
Why this role exists
Confido's voice agents are already live in production. The data infrastructure is live too: ETL from Firebase, MongoDB, and Postgres into BigQuery Metabase dashboards task-level reporting Cekura evals Langfuse telemetry and Prefect orchestration. What does not exist yet is the intelligence layer on top of that infrastructure. We need someone to define what excellent means for a Confido voice agent call — and build the measurement, feedback, and improvement system that gets us there. This is not a pipeline engineering role. The infrastructure exists. Your job is to decide what matters, measure it rigorously, and create a closed loop that helps product, FDE, and implementation teams improve call quality week over week. The goal is simple: Confido should be #1 in healthcare AI on automation rate and patient satisfaction. That only happens if we know why calls fail, what good patient experience actually sounds like, and whether every shipped improvement moved the numbers. Generic data leaders will not work here. A dashboard that says completion rate is down 6% is not enough. We need someone who can listen to a call, identify whether the patient felt heard, convert that into a measurable rubric, calibrate an LLM judge against expert human review, and then prove whether the next agent release improved the experience.
What Youʼll Do
- Own automation rate and patient satisfaction as operating metrics — You will be accountable for the two numbers that matter most: 75% automation rate and 90% patient satisfaction. Automation rate needs to be understood per workflow, per client, and in real time. Patient satisfaction needs to be measured per call and benchmarked across the portfolio. Your job is not to build dashboards for these metrics. Your job is to make the numbers move, with evidence that cannot be disputed.
- Define and deploy conversation quality rubrics — Build the rubric system that turns qualitative concepts like empathy, tone, confusion, resolution satisfaction, and patient confidence into quantitative scores at production scale. Use LLM-as-judge evaluation alongside expert human review for high-stakes samples. Calibrate the judges, measure inter-rater reliability, and keep tuning the system as agent behavior changes.
- Build turn efficiency and voice/speech evaluation — Define what good conversation flow looks like by workflow. Identify optimal turn ranges by call type. Detect verbosity, repetition, dead time, redundant confirmations, and poor handoff moments. Evaluate the audio layer too: ASR accuracy on clinical entities, TTS naturalness, latency, endpointing, barge-in, and Spanish-English parity. Benchmark Retell and alternative vendors against production traffic.
- Close the evaluation loop — Build the system where production calls become evaluated data, evaluated data surfaces failures, failures turn into product or FDE action, and every shipped fix gets measured against the baseline. You own the proof that the loop is working — week over week, month over month, and deployment over deployment.
- Make evaluation actionable for FDEs — FDEs should reach for your reports before every client meeting. Translate evaluation output into Metabase views, client-ready insights, and operational formats that help FDEs know what to fix, what improved, and what risks to raise. This is not analytics consumption. Thi is operational tooling for the people closest to customers.
- Lead the data team through craft — This is a senior leaderIC hybrid role. You will pair on hard problems, review rubrics and pipelines, and set the standard for what goodˮ looks like. The team should get sharper because they work near you. Your leadership comes through proximity, judgment, and example — not process overhead.
What This Role Is Not
- You are not the pipeline engineer. The existing data engineer owns ETL, BigQuery pipelines, Metabase, and orchestration. You extend the stack you do not rebuild it.
- You are not the BI analyst. The visualization layer exists. Your job is to define what should be measured, whether the measurement is valid, and how the resulting insight changes behavior.
- You are not a pure researcher. Academic rigor matters, but this is a production-facing role. Rubrics ship. Loops close. Numbers move.
- You are not here to produce reports that sit unused. If FDEs, product, and implementation teams are not making decisions from your work, the system is not working.
How Youʼll Work Day-to-day
- Monday morning you review production call performance from the previous week. Automation rate is down on prescription refill calls for one client, but the task completion metric alone does not explain why. You sample calls, find that the agent is technically completing the workflow but creating patient confusion through redundant confirmations, and update the rubric to capture it.
- Tuesday you sit with an FDE before a client meeting. Instead of giving them a generic dashboard, you show them the three workflows where patient experience is improving, the two failure types still driving escalations, and the exact before-and-after impact from last weekʼs agent change. The FDE uses your report in the meeting.
- Wednesday you calibrate an LLM judge against human review. The judge is over-scoring empathy on calls where the patient sounded frustrated but the agent used polite language. You revise the rubric, rerun the sample, and measure whether agreement improved.
- Thursday you benchmark voice vendors on real production traffic. Retell performs well on latency, but another vendor is stronger on barge-in and Spanish-English parity. You do not make a theoretical recommendation — you show the quantified tradeoff by workflow and call type.
- Friday you close the loop on a product release. The team shipped a fix to reduce agent verbosity on scheduling calls. You compare the new cohort against the baseline, show turn count reduction, patient confusion reduction, and no drop in completion rate. The team now knows the change worked.
What weʼre looking for
- 5- 10 years of experience in data, evaluation, ML systems, conversational AI, or applied analytics, with at least some experience operating in production environments where metrics directly influenced product decisions.
- Deep experience designing LLMasjudge evaluation rubrics. You have built, calibrated, and iterated rubrics for conversational AI or similar systems. You know how to decompose a qualitative concept like the patient felt heardˮ into a measurable, reproducible signal.
- Strong understanding of conversation quality and patient experience measurement. You can define and operationalize metrics for communication clarity, tone, active listening, patient confidence, confusion signals, and resolution satisfaction. You understand the difference between measuring what was said and measuring how it landed.
- Experience with turn efficiency and verbosity measurement. You can define optimal turn ranges by call type, build penalty systems for repetition and dead time, and score conversation flow quality systematically.
- Familiarity with voice AI systems. You understand ASR accuracy, TTS naturalness, endpointing, latency, barge-in, and bilingual parity. You have opinions on tools and vendors like Retell, Deepgram, AssemblyAI, Cartesia, and ElevenLabs, and you can benchmark them using production traffic.
- Comfort operating on top of existing production data infrastructure. You can work with BigQuery, Metabase, Langfuse, Cekura, Prefect, and GCP without needing the stack rebuilt from scratch.
- Track record building closed feedback loops. You have connected evaluation output to product improvement before — not just dashboards, but systems where changes ship and gains are measured against baselines.
- Strong product judgment and operational empathy. You know how to build reporting that practitioners actually use. You understand that a technically correct metric is useless if FDEs and builders do not know what to do with it.
- Technical leadership through proximity. You raise the bar by pairing, reviewing, and modeling excellent work. You can lead without hiding behind process.
Nice-to-haves
- Healthcare data or HIPAA compliance experience.
- Experience evaluating bilingual voice AI systems, especially SpanishEnglish production calls.
- Experience in a highgrowth startup environment where systems had to scale from 10 clients to 100 clients.
- Experience with clinical entity recognition in ASR or NLP contexts.
- Prior exposure to healthcare operations, revenue cycle workflows, patient access, scheduling, referrals, or prescription refill workflows.
Why join now
- The agents are already live. You are not inventing a theoretical evaluation system — you are building the operating layer for production AI used by real patients and real healthcare practices.
- The infrastructure exists. You inherit BigQuery, Metabase, Cekura, Langfuse, and Prefect. Your leverage comes from defining what to measure and making the system smarter.
- The problem is urgent. Automation rate and patient satisfaction are the two numbers that determine whether Confido becomes the category leader in healthcare AI.
- Your work will directly shape product quality, customer trust, and revenue expansion. When evaluation improves, agents improve. When agents improve, customers expand.
- This is a rare chance to define the quality standard for AI voice agents in healthcare — not in a lab, but in production.
How to apply for this opportunity
- Step 1: Click On Apply! And Register or Login on our portal.
- Step 2: Complete the Screening Form & Upload updated Resume
- Step 3: Increase your chances to get shortlisted & meet the client for the Interview!
About Uplers:
Our goal is to make hiring reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant contractual onsite opportunities and progress in their career. We will support any grievances or challenges you may face during the engagement.
(Note: There are many more opportunities apart from this on the portal. Depending on the assessments you clear, you can apply for them as well).
So, if you are ready for a new challenge, a great work environment, and an opportunity to take your career to the next level, don't hesitate to apply today. We are waiting for you!