Data Scientist (Remote)

Codvo.ai

Pune, India

4-6 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

Data Scientist

About Us:

At Codvo, we are committed to building scalable, future-ready data platforms that power business impact. We believe in a culture of innovation, collaboration, and growth, where engineers can experiment, learn, and thrive. Join us to be part of a team that solves complex data challenges with creativity and cutting-edge technology.

Role Summary

Model development, training pipeline, and analytics backend. Works in close coordination with

the on-site Data Scientist the on-site person provides site context and validation feedback,

the offshore person implements model improvements, retraining logic, and drift detection.

Responsibilities

Model Development & Training

Maintain and improve the physics-based simulation engine 19 equipment families,
64+ fault signatures, first-principles governing equations
Run model training pipelines dataset generation, feature engineering, model fitting, hyperparameter tuning, MLflow experiment tracking
Implement model retraining triggers drift detection (PSI-based), accuracy degradation monitoring, scheduled recalibration
Build and maintain the champion/challenger evaluation framework shadow scoring, A/B testing, promotion guardrails
Develop new fault signatures as customer feedback identifies gaps

Analytics & Calibration

Implement probability calibration Platt scaling, isotonic regression, ECE monitoring
Build the adaptive threshold controller feedback-driven alarm threshold adjustment based on false alarm rate and recall
Develop the CMMS label linking pipeline match work orders to predictions with confidence scoring
Analyze prediction outcomes precision, recall, F1 by equipment family, by fault type, by site
Produce the weekly and monthly accuracy reports

Feature Engineering & Data Quality

Define and maintain feature sets for each equipment family physics-informed features, rolling statistics, cross-tag correlations
Monitor data quality metrics null rates, stale timestamps, schema violations, sensor drift
Build the healthy baseline update pipeline daily computation of per-tag statistics from healthy operating data
Implement the training data snapshot pipeline versioned, reproducible dataset extraction with manifest tracking

Expected Background

4+ years in machine learning engineering or applied data science
Strong Python skills pandas, scikit-learn, XGBoost/LightGBM, MLflow
Experience with time-series data, anomaly detection, or predictive maintenance modeling
Understanding of model deployment patterns model registry, versioning, A/B testing, canary deployments
Experience with statistical process control, calibration, or reliability engineering is a plus