ML Engineer Test & Learn Platform (3+ Years Experience)
About The Role
We're looking for an ML Engineer to join our Test & Learn Platform team. You'll build and scale our experimentation and causal inference services from statistical engines to API integrations and cloud pipelines empowering business teams globally to make data-driven decisions.
What You'll Do
- Develop and maintain statistical/ML modules (DID, Synthetic Control, A/B Testing, Multi-Treatment Effects) in Python
- Build and extend FastAPI services and integrate them with our web application via SDK wrappers
- Design and optimize large-scale data pipelines using PySpark, Delta Lake, and Azure Data Lake
- Profile and resolve OOM issues in PySpark jobs optimize memory allocation, partitioning, broadcast joins, caching strategies, and Spark configurations
- Deploy and manage workloads on Databricks, including job clusters, notebooks, and Delta Lake tables
- Containerize and deploy services using Docker, Kubernetes, and CI/CD pipelines
- Ensure code quality and security via SonarCloud, Snyk, and pytest
- Collaborate with data scientists and product teams to translate research into production-ready modules
Must-Have Skills
- Python (3.9+) 3+ years of production experience
- PySpark & Spark Internals strong experience with Spark memory model, executor tuning, shuffle optimization, and diagnosing/resolving OOM errors (broadcast thresholds, partition skew, spill-to-disk, GC tuning)
- Databricks hands-on with job orchestration, cluster configuration, notebook workflows, and Delta Lake optimization (Z-ordering, compaction, caching)
- Causal Inference & Experimentation DID, synthetic control, A/B testing, hypothesis testing, panel data methods
- Statistics/ML Libraries statsmodels, scikit-learn, scipy, pandas, numpy
- API Development building RESTful services with FastAPI (or similar)
- Cloud (Azure) Azure Storage, Azure ML, Data Lake
- Docker & Kubernetes containerization and orchestration for ML workloads
- Testing writing robust unit/integration tests with pytest
Nice-to-Have
- Experience with Celery/Redis for async task orchestration
- Familiarity with Polars, PyArrow, or SQLAlchemy
- Background in econometrics or experimental design
- Spark UI profiling and performance benchmarking
- CI/CD tooling (SonarCloud, Snyk, GitHub Actions)
What Sets You Apart
- You can look at a Spark execution plan and pinpoint why a job is OOM-ing
- You think in modules clean separation of data processing, inference, and post-processing
- You can go from a Jupyter notebook prototype to a production-grade, testable service
- You're comfortable with both statistical rigor and software engineering best practices
Python, PySpark, Databricks, Azure Cloud