SENIOR SCIENTIST - Machine Learning

Happiest Minds Technologies

Bengaluru, India

3-5 Years

Save

Posted 10 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

ML Engineer Test & Learn Platform (3+ Years Experience)

About The Role

We're looking for an ML Engineer to join our Test & Learn Platform team. You'll build and scale our experimentation and causal inference services from statistical engines to API integrations and cloud pipelines empowering business teams globally to make data-driven decisions.

What You'll Do

Develop and maintain statistical/ML modules (DID, Synthetic Control, A/B Testing, Multi-Treatment Effects) in Python
Build and extend FastAPI services and integrate them with our web application via SDK wrappers
Design and optimize large-scale data pipelines using PySpark, Delta Lake, and Azure Data Lake
Profile and resolve OOM issues in PySpark jobs optimize memory allocation, partitioning, broadcast joins, caching strategies, and Spark configurations
Deploy and manage workloads on Databricks, including job clusters, notebooks, and Delta Lake tables
Containerize and deploy services using Docker, Kubernetes, and CI/CD pipelines
Ensure code quality and security via SonarCloud, Snyk, and pytest
Collaborate with data scientists and product teams to translate research into production-ready modules

Must-Have Skills

Python (3.9+) 3+ years of production experience
PySpark & Spark Internals strong experience with Spark memory model, executor tuning, shuffle optimization, and diagnosing/resolving OOM errors (broadcast thresholds, partition skew, spill-to-disk, GC tuning)
Databricks hands-on with job orchestration, cluster configuration, notebook workflows, and Delta Lake optimization (Z-ordering, compaction, caching)
Causal Inference & Experimentation DID, synthetic control, A/B testing, hypothesis testing, panel data methods
Statistics/ML Libraries statsmodels, scikit-learn, scipy, pandas, numpy
API Development building RESTful services with FastAPI (or similar)
Cloud (Azure) Azure Storage, Azure ML, Data Lake
Docker & Kubernetes containerization and orchestration for ML workloads
Testing writing robust unit/integration tests with pytest

Nice-to-Have

Experience with Celery/Redis for async task orchestration
Familiarity with Polars, PyArrow, or SQLAlchemy
Background in econometrics or experimental design
Spark UI profiling and performance benchmarking
CI/CD tooling (SonarCloud, Snyk, GitHub Actions)

What Sets You Apart

You can look at a Spark execution plan and pinpoint why a job is OOM-ing
You think in modules clean separation of data processing, inference, and post-processing
You can go from a Jupyter notebook prototype to a production-grade, testable service
You're comfortable with both statistical rigor and software engineering best practices

Python, PySpark, Databricks, Azure Cloud