We're hiring a Data Scientist / Data Engineer to help us turn raw data into reliable datasets, insights, and models that drive real decisions. This role blends strong data engineering (pipelines, quality, orchestration) with hands-on data science (analysis, experimentation, forecasting, ML when needed). You'll work closely with product and engineering teams to build data products that are accurate, scalable, and actionable.
What you'll do
- Design and build end-to-end data pipelines (batch and, if applicable, streaming).
- Collect, clean, transform, and model data into well-structured datasets for analytics and ML.
- Develop and maintain a data warehouse/lake model (dimensional modeling, data marts, curated layers).
- Implement data quality checks, observability, lineage, and monitoring.
- Perform exploratory analysis and deliver insights via dashboards, notebooks, and stakeholder-ready summaries.
- Build and deploy ML models when needed (forecasting, churn/segmentation, anomaly detection, recommendations).
- Run experiments / A/B testing support (metrics definitions, evaluation, statistical validity).
- Collaborate with backend teams to define event schemas, tracking plans, and data contracts.
- Optimize performance and cost across storage, compute, and queries.
Must-have skills
- Strong SQL and solid programming skills (Python preferred).
- Experience building pipelines using tools like Airflow / Dagster / Prefect (or equivalent).
- Strong knowledge of data modeling (star schema, slowly changing dimensions, event modeling).
- Experience with at least one of: PostgreSQL / MySQL / BigQuery / Snowflake / Redshift.
- Proven ability to validate data correctness and implement data quality frameworks.
- Comfortable communicating insights and technical trade-offs to non-technical stakeholders.
Nice-to-have skills
- Streaming: Kafka / Kinesis / PubSub, real-time processing (Spark Streaming / Flink).
- Big data: Spark, distributed compute, partitioning strategies.
- Lakehouse: Iceberg / Delta / Hudi, object storage (S3/GCS/Azure Blob).
- MLOps: MLflow, model monitoring, feature stores, deployment pipelines.
- BI: Superset / Power BI / Looker / Metabase, semantic layers.
- Cloud: AWS/Azure/GCP (IAM, networking basics, managed data services).
- Experience with privacy/security compliance (PII handling, retention policies, access controls).
What we value
- Ownership: you build reliable systems, not just one-off scripts.
- Curiosity: you ask the why behind metrics and propose better approaches.
- Practicality: you can balance speed vs correctness and deliver iteratively.
- Strong collaboration with engineers, product, and leadership.