Search by job, company or skills

O

Principal Data Scientist

new job description bg glownew job description bg glownew job description bg svg
  • Posted 23 days ago
  • Be among the first 20 applicants
Early Applicant

Job Description

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.

The Principal Data Scientist ML/DL is a primary driver in the design and development of state-of-the-art Artificial Intelligence solutions for medical applications. The Principal Data Scientist ML/DL works closely with senior data scientists, machine learning engineers, software engineers and subject matter experts on current company technologies and forward-looking projects. They are the drivers of new research and solution implementation in the creation of novel artificial intelligence approaches.

Primary responsibilities include the enhancement of existing company NLP technologies and extension of those systems in new cloud-based applications. Emphasis is on development of novel machine/deep learning techniques for information extraction and synthesis. They translate research code into clinical NLP solutions deployed at scale in production environments including statistical methods, deep learning, and large language model technologies. Work will involve all aspects of methods development from initial PoC implementation to performance characterization and production launch of new methods.

The successful candidate will have a solid history of publication in Machine/Deep Learning with an emphasis on Natural Language Processing, Information Retrieval and/or Information Extraction. Exposure to recent research literature and the ability to effectively implement new technologies is key. The successful candidate will have proven success in taking machine/deep learning solutions to production environments. Solid technical skills are required.

Primary Responsibilities:

  • Lead end-to-end training and fine-tuning of Large Language Models (LLMs), including both open-source (e.g., Qwen, LLaMA, Mistral) and closed-source (e.g., OpenAI, Gemini, Anthropic) ecosystems
  • Architect and implement GraphRAG pipelines, including knowledge graph representation and retrieval for enhanced contextual grounding.
  • Design, train, and optimize semantic and dense vector embeddings for document understanding, search, and retrieval.
  • Develop semantic retrieval systems with advanced document segmentation and indexing strategies.
  • Build and scale distributed training environments using NCCL and InfiniBand for multi-GPU and multi-node training.
  • Apply reinforcement learning techniques (e.g., RLHF, RLAIF) to align model behavior with human preferences and domain-specific goals.
  • Collaborate with cross-functional teams to translate business needs into AI-driven solutions and deploy them in production environments
  • Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so

Required Qualifications:

  • Deep knowledge and extensive experience with Machine/Deep Learning frameworks including transformer architectures, state space models, large language models, and agentic approaches
  • Knowledge of algorithms and techniques within a computational domain with emphasis on text processing
  • Demonstrated publication record in AI domain especially relating to text extraction and summarization
  • Experience with Hybrid NLP solutions that combine symbolic and machine learning approaches

Preferred Qualifications

  • PhD or master's degree in computer science, Machine Learning, or related field
  • 12+ years of experience in applied AI/ML with statistics, with a strong track record of delivering production-grade models
  • Deep expertise in: NLP, Fundamental machine learning, deep learning, transformer, state space-based architecture
  • Azure ML and/or AWS
  • Exploratory Data Analysis (EDA)
  • Experience with PyTorch
  • Experience with LLM training and fine-tuning (e.g., GPT, LLaMA, Mistral, Qwen)
  • Experience with graph-based retrieval systems (GraphRAG, knowledge graphs)
  • Experience with embedding models (e.g., BGE, E5, SimCSE)
  • Experience with semantic search and vector databases (e.g., FAISS, Weaviate, Milvus)
  • Experience with document segmentation and preprocessing (OCR, layout parsing)
  • Experience with distributed training frameworks (NCCL, Horovod, DeepSpeed)
  • Experience with high-performance networking (InfiniBand, RDMA)
  • Experience with model fusion and ensemble techniques (stacking, boosting, gating)
  • Experience with optimization algorithms (Bayesian, Particle Swarm, Genetic Algorithms)
  • Experience with Symbolic AI and rule-based systems
  • Experience with meta-learning and Mixture of Experts architectures
  • Experience with reinforcement learning (e.g., RLHF, PPO, DPO, GRPO), Supervised Fine Tuning (SFT), LoRA, QLoRA, axolotl
  • Experience with prompt optimization framework (AutoPrompt, GreaterPrompt, DSPy), GEPA
  • Proven solid in Python coding, SQL and database queries, data preparation, and analysis

Bonus Skills:

  • Experience with healthcare data and medical coding systems (e.g., CPT, CM, PCS)
  • Familiarity with regulatory and compliance frameworks in AI deployment
  • Contributions to open-source AI projects or published research. And/Or ability to take research papers to poc - production

About Company

Optum, Inc. is an American pharmacy benefit manager and health care provider. It is a subsidiary of UnitedHealth Group since 2011. UHG formed Optum by merging its existing pharmacy and care delivery services into the single Optum brand, comprising three main businesses: OptumHealth, OptumInsight and OptumRx.In 2017, Optum accounted for 44 percent of UnitedHealth Group's profits and as of 2019, Optum's revenues have surpassed $100 billion.Also in early 2019, Optum gained significant media attention regarding a trade secrets lawsuit that the company filed against former executive David William Smith, after Smith left Optum to join Haven, the joint healthcare venture of Amazon, JPMorgan Chase, and Berkshire Hathaway.

Job ID: 144202283

Similar Jobs