Carelon - Senior Data Scientist - NLP/LLM

carelon global solutions india

Bengaluru, India

5-7 Years

Save

Posted 18 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Position

We are currently looking to hire a Senior Data Scientist with strong analytical skills and a background in US Healthcare. The ideal candidate should have :

A minimum of 5+ years of overall experience in Data science or related fields
At least 3 years of hands-on experience in Machine Learning (ML) and Natural Language Processing (NLP)
Candidates with proven expertise in healthcare data analytics and a solid understanding of healthcare systems in the US will be preferred

Job Responsibility

Key Responsibilities :

Demonstrate expertise in programming with a strong background in machine learning and data processing.
Possess strong analytical skills to interpret complex healthcare datasets and derive actionable insights.
Collaborate closely with AI/ML engineers, data scientists, and product teams to acquire and process data, debug issues, and enhance ML models.
Develop and maintain enterprise-grade data pipelines to support state-of-the-art AI/ML models.
Work with diverse data types including structured, semi-structured, and textual data.
Communicate effectively and collaborate with cross-functional teams including engineering, product, and customer stakeholders.
Operate independently with minimal guidance from product managers and architects, demonstrating strong decision-making capabilities.
Embrace complex problems and deliver intelligence-driven solutions with a focus on innovation and scalability.
Quickly understand product requirements and adapt to evolving business needs and technical environments.

Technical Responsibilities

Design and implement statistical and machine learning models (e.g., regression, classification, clustering) using frameworks such as scikit-learn, TensorFlow, and PyTorch.
Build robust data preprocessing pipelines to handle missing values, outliers, feature scaling, and dimensionality reduction.
Specialize in Large Language Model (LLM) development, including fine-tuning, prompt engineering, and embedding optimization using frameworks like Hugging Face Transformers.
Develop and optimize LLM evaluation frameworks using metrics such as ROUGE, BLEU, and custom human-aligned evaluation techniques.
Apply advanced statistical methods including hypothesis testing, confidence intervals, and experimental design to extract insights from complex datasets.
Create NLP solutions for text classification, sentiment analysis, and topic modeling using both classical and deep learning approaches.
Design and execute A/B testing strategies, including sample size determination, metric selection, and statistical analysis (e.g., t-tests, ANOVA).
Implement comprehensive data visualization strategies using tools like Matplotlib, Seaborn, and Plotly to present insights effectively.
Maintain detailed documentation of model architectures, experiments, and validation results using tools like MLflow or DVC.
Research and apply LLM optimization techniques such as quantization, pruning, and knowledge distillation to improve efficiency.
Stay up to date with the latest advancements in statistical learning, deep learning, and LLM research, with a focus on emerging architectures and training :
Bachelors or masters degree in computer science, Mathematics or Statistics, Computational linguistics, Engineering, or a related field. Ph.D. preferred.

Experience

5+ years of overall professional experience in data science, analytics, or related fields.
3+ years of hands-on experience working with large-scale structured and unstructured data to develop data-driven insights and solutions using Machine Learning (ML), Natural Language Processing (NLP), and Computer Vision.
Proven 3+ years of experience with core technologies including Python (mandatory), SQL, Hugging Face, TensorFlow, Keras, PyTorch, and Apache Spark.
3+ years of experience in developing NLP models, with a strong focus on transformer-based architectures.
2+ years of experience implementing information retrieval systems at scale, including both keyword-based and semantic search using embeddings.
Hands-on experience with cloud platforms such as Google Cloud Platform (GCP) and Amazon Web Services (AWS).
Strong expertise in Large Language Models (LLMs) and Generative AI (GAI), including model development, fine-tuning, and optimization.
Demonstrated ability to work independently with minimal supervision and exercise sound judgment in technical and business decision-making.
In-depth experience with LLMs (both extractive and generative), including prompt engineering, fine-tuning, and familiarity with open-source ecosystems.
Experience in prompt development and optimization for NLP applications.
Strategic thinker with a blend of technical expertise and business acumen, capable of solving complex problems and influencing outcomes.
Proficient in creating analytical reports, projections, models, and presentations to support business objectives.
Excellent written and verbal communication skills, with strong stakeholder management capabilities.
Prior experience in the healthcare industry, with an understanding of domain-specific data and regulatory considerations.

Skills And Competencies