Machine Learning Engineer

Smart Data Solutions

India

4-6 Years

This job is no longer accepting applications

Posted 6 months ago

Job Description

About Us

For over 20 years, Smart Data Solutions has been partnering with leading payer organizations to provide automation and technology solutions enabling data standardization and workflow automation. The company brings a comprehensive set of turn-key services to handle all claims and claims-related information regardless of format (paper, fax, electronic), digitizing and normalizing for seamless use by payer clients. Solutions include intelligent data capture, conversion and digitization, mailroom management, comprehensive clearinghouse services and proprietary workflow offerings. SDS headquarters are just outside of St. Paul, MN and leverages dedicated onshore and offshore resources as part of its service delivery model. The company counts over 420 healthcare organizations as clients, including multiple Blue Cross Blue Shield state plans, large regional health plans and leading independent TPAs, handling over 500 million transactions of varying types annually with a 98%+ customer retention rate. SDS has also invested meaningfully in automation and machine learning capabilities across its tech-enabled processes to drive scalability and greater internal operating efficiency while also improving client results.

SDS recently partnered with a leading growth-oriented investment firm, Parthenon Capital, to further accelerate expansion and product innovation.

Location : 6th Floor, Block 4A, Millenia Business Park, Phase II MGR Salai, Kandanchavadi , Perungudi Chennai 600096, India.

Smart Data Solutions is an equal opportunity employer.

All qualified applicants will receive consideration for employment without regard to race, color, sex, sexual orientation, gender identity, religion, national origin, disability, veteran status, age, marital status, pregnancy, genetic information, or other legally protected status

To perform this job successfully, an individual must be able to perform each essential duty satisfactorily. The requirements listed above are representative of the knowledge skill and or ability required. Reasonable accommodation may be made to enable individuals with disabilities to perform essential job functions.

Due to access to Protected Healthcare Information, employees in this role must be free of felony convictions on a background check report.

Responsibilities

Duties and Responsibilities include but are not limited to:

Design and build ML pipelines forOCR extraction,document image processing, andtext classificationtasks.
Fine-tune or promptlarge language models (LLMs)(e.g., Qwen, GPT, LLaMA , Mistral) for domain-specific use cases.
Develop systems to extract structured data fromscanned or unstructured documents(PDFs, images, TIFs).
IntegrateOCR engines(Tesseract, EasyOCR , AWS Textract , etc.) and improve their accuracy via pre-/post-processing.
Handlenatural language processing (NLP)tasks such as named entity recognition (NER), summarization, classification, and semantic similarity.
Collaborate with product managers, data engineers, and backend teams to productionize ML models.
Evaluate models using metrics like precision, recall, F1-score, and confusion matrix, and improve model robustness and generalizability.
Maintain proper versioning, reproducibility, and monitoring of ML models in production.

The duties set forth above are essential job functions for the role. Reasonable accommodations may be made to enable individuals with disabilities to perform essential job functions.

Skills And Qualifications

45 years of experience in machine learning, NLP, or AI roles
Proficiency withPythonand ML libraries such as PyTorch , TensorFlow, scikit-learn, Hugging Face Transformers.
Experience withLLMs(open-source or proprietary), including fine-tuning or prompt engineering.
Solid experience inOCR tools(Tesseract, PaddleOCR , etc.) and document parsing.
Strong background intext classification,tokenization, and vectorization techniques (TF-IDF, embeddings, etc.).
Knowledge of handlingunstructured data(text, scanned images, forms).
Familiarity with MLOps tools: MLflow , Docker, Git, and model serving frameworks.
Ability to write clean, modular, and production-ready code.
Experience working withmedical, legal, or financial document processing.
Exposure tovector databases(e.g., FAISS, Pinecone, Weaviate ) andsemantic search.
Understanding ofdocument layout analysis(e.g., LayoutLM , Donut, DocTR ).
Familiarity withcloud platforms(AWS, GCP, Azure) and deploying models at scale