Data Science & AI/ML Lead (EDA Experience)
Level: SM
Role Overview
A hands-on Data Science and AI/ML Lead responsible for owning the end-to-end model training lifecycle, starting from EDA and feature engineering through training, evaluation, and deployment readiness. The role focuses on building reproducible, production-grade ML pipelines and ensuring data and models are optimized for performance, scalability, and reliability.
Key Responsibilities
- Exploratory Data Analysis & Model Development
- Translate business problems and Use cases into model-ready ML formulations.
- Perform deep EDA and data profiling to understand patterns, data quality, and feature relevance
- Define feature engineering strategy aligned to model performance objectives
- Ensure reproducibility through dataset versioning and experiment tracking
- Define pipeline strategy for continuous retraining and validation.
- Train and optimize models for classification, regression, clustering, and anomaly detection, LLM/SLM Pretraining and Finetuning, etc.
- Perform hyperparameter tuning and model selection for optimal performance
- Drive trade-offs across accuracy, latency, cost, and interpretability
- Scoring, Evaluation & Benchmarking
- Define evaluation and scoring frameworks for Datasets and certify for AI Readiness (Model Training)
- Conduct error analysis and benchmarking across datasets and model versions
- Establish acceptance thresholds and quality gates for production readiness.
- Scalable ML & MLOps Enablement
- Enable ML lifecycle practices including model versioning, tracking, and monitoring
- Work with cloud platforms (Azure/AWS/GCP) for scalable training and deployment
- Collaborate with engineering teams to ensure production-grade integration
- Optimize platform performance, reliability, and scalability.
Required Capabilities / Skills / Experience
- 12+ years in Data Science / Machine Learning with strong hands-on experience
- Strong expertise in Python and ML/DL frameworks (scikit-learn, PyTorch, TensorFlow)
- Deep experience in EDA, feature engineering, and model training pipelines
- Experience building production-grade ML pipelines and evaluation frameworks
- Exposure to cloud ML platforms (Azure/Vertex/SageMaker)
- Experience with large-scale data processing and distributed training
- Hands-on experience with classical ML algorithms (Decision Trees, Random Forest, XGBoost, Gradient Boosting etc.)
- Exposure to LLM/SLM training or fine-tuning techniques (PEFT, LoRA, fine-tuning workflows)
- Exposure to LLM / GenAI workflows as integration points
- Familiarity with data quality, labelling, and dataset curation at scale
- Strong problem-solving and system thinking skills.