Search by job, company or skills

roundcircle

Data/ML Engineer

Save
new job description bg glownew job description bg glow
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the Role

We are looking for a Data Engineer / Machine Learning Engineer to build and scale the data and intelligence layer powering our multi-tenant SaaS platform. You will own data pipelines that ingest, transform, and serve billions of events across analytics, billing, and product surfaces — and increasingly power ML-driven features such as scoring, recommendations, and intelligent automation.

This is a hands-on role with deep ownership: you will design schemas, build pipelines end-to-end, optimize query performance on ClickHouse, and ship ML workflows into production. You will work closely with backend, product, and founding engineering leadership.

Core Responsibilities

  • Design, build, and maintain scalable batch and streaming data pipelines using Apache Airflow and Python.
  • Model and optimize analytical workloads on ClickHouse — including partition strategy, sort keys, materialized views, and ReplacingMergeTree / AggregatingMergeTree patterns.
  • Build and maintain ETL/ELT workflows for ingestion from operational stores (Mysql, PostgreSQL, MongoDB) into the analytics warehouse.
  • Develop, deploy, and monitor machine learning models — from feature engineering to training, evaluation, and production serving.
  • Define and enforce data contracts, schema evolution, and data quality checks across services.
  • Partner with backend teams to instrument event tracking and ensure data correctness across multi-tenant boundaries.
  • Optimize query performance and cost; investigate and resolve slow queries, full partition scans, and skew issues.
  • Contribute to the MLOps stack: model versioning, experiment tracking, monitoring, and retraining pipelines.
  • Write clean, tested, well-documented code. Participate in code reviews and design discussions.

Mandatory Skills

  • Python Programming — strong proficiency, including pandas, NumPy, and production-grade code (typing, packaging, testing).
  • Data Pipelines — solid experience designing batch and/or streaming pipelines, with awareness of idempotency, backfills, and failure recovery.
  • Apache Airflow — authoring DAGs, custom operators, sensors, and managing dependencies in production.
  • ClickHouse — hands-on experience with table engines (MergeTree family), partitioning, sort keys, and materialized views.
  • SQL — advanced proficiency: window functions, CTEs, query plans, and performance tuning on large datasets.
  • Relational and NoSQL databases — working knowledge of PostgreSQL and MongoDB (schemas, indexing, CDC patterns).
  • Distributed data processing — practical experience with PySpark, Dask, or equivalent for large-scale transforms.
  • Message brokers & streaming — hands-on experience with RabbitMQ and Apache Kafka; understanding of producers/consumers, partitioning, consumer groups, delivery guarantees, and dead-letter handling.
  • Machine Learning fundamentals — supervised/unsupervised techniques, model evaluation, and at least one framework (scikit-learn, PyTorch, or TensorFlow).
  • Version control with Git and collaborative workflows (PRs, code reviews).

Preferred Skills

  • Experience in agile development environments.
  • Familiarity with DevOps tools and CI/CD pipelines (e.g., GitHub Actions, GitLab CI, Jenkins).
  • Knowledge of containerization tools like Docker and orchestration platforms like Kubernetes.
  • Exposure to cloud platforms like AWS or GCP (BigQuery, Cloud Composer, GKE, Pub/Sub, Dataflow) is a plus.
  • Familiarity with CDC tools (Debezium) and stream processing frameworks (Kafka Streams, Flink).
  • Exposure to MLOps tooling — MLflow, Weights & Biases, SageMaker, Vertex AI, or equivalent.
  • Experience with LLMs and Generative AI — embeddings, RAG, vector databases (pgvector, Pinecone, Weaviate), and prompt orchestration frameworks.
  • Familiarity with observability tools — Grafana, Prometheus, or Datadog — for data pipelines.
  • Bachelor's or Master's degree in Computer Science, Engineering, Mathematics, or related field.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147485511