Search by job, company or skills

Yugen.ai

Machine Learning Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We're hiring an ML Engineer to work alongside Data Scientist(s) and support a leading client in the ad tech domain. You will own the infrastructure, low-latency APIs, data pipelines, deployment, and reliability of recommendation and ranking models in production. You'll be the bridge between data science and engineering: taking prototypes from the Data Scientist and turning them into robust, low-latency, high-availability services that operate at ad-tech scale. You should be comfortable with asynchronous communication(written updates, docs, Slack-style collaboration) with both the client and our internal team across time zones.

The Candidate Will Have Responsibilities Across The Following Functions

Model Productionization and Serving:

  • Design, build, and maintain low-latency APIs for serving recommendation and ranking models.
  • Take Data Scientist-built models (in Python) and productionize them for real-time or near-real-time serving.
  • Implement and maintain model serving endpoints (e. g., using SageMaker, Vertex AI, custom Docker/Kubernetes-based services, or similar).
  • Optimise for low latency and high throughput, suitable for ad-serving workloads.

Feature Pipelines And Data Engineering

  • Design and build feature pipelines for training and inference:
  • Batch pipelines using tools like Airflow, dbt, Beam, or Spark.
  • Streaming / real-time features using Kafka, Pub/Sub, etc
  • Design, integrate with, or operate an online feature store to serve low-latency features for real-time scoring.
  • Ensure training-serving skew is minimised; maintain clear contracts for feature definitions and data schemas.

Infrastructure And MLOps

  • Implement CI/CD for ML models and pipelines (e. g., GitHub Actions, GitLab CI, Cloud Build, etc. ).
  • Manage containerization and deployment using Docker and Kubernetes (or managed equivalents).
  • Set up and maintain model versioning, configuration management, and rollback strategies.

Monitoring, Observability And Reliability

  • Work with the Data Scientist to define metrics and implement monitoring for:
  • Model performance (prediction distribution, drift, business KPIs).
  • System performance (latency, error rates, resource utilisation).
  • Data quality (schema checks, nulls, outliers, volume anomalies).
  • Build alerting and logging using the client's stack (e. g., Prometheus, Grafana, Cloud Monitoring, CloudWatch, etc. ).
  • Investigate and resolve production issues, from infrastructure to data to model-related problems.

Experimentation Platform Support

  • Integrate models with the client's AB testing/experimentation framework.
  • Implement traffic splits, routing logic, and variant toggles (feature flags).
  • Ensure metrics and logs needed for experiment analysis are correctly captured and accessible.

Collaboration And Client Interaction

  • Work closely with the Data Scientist to understand modelling assumptions and requirements.
  • Collaborate with the client Product and Engineering teams to align on SLAs, integration points, and architectural choices.
  • Participate in technical discussions with client partners; communicate trade-offs and propose pragmatic solutions.
  • Provide clear async updates(tickets, comments, design docs, status summaries) so both the client and internal teams stay aligned without needing constant meetings.

Requirements

  • Experience: 2-5 years as an ML Engineer / Data Engineer / Software Engineer working on ML-heavy systems.
  • Programming: Strong skills in Python
  • Cloud: Hands-on experience with GCP or AWS
  • Data Engineering: Experience building and operating data pipelines (batch and/or streaming) using tools like Airflow, dbt, Beam, Spark, or similar.
  • MLOps / Infra: Experience with:
  • Containerization (Docker) and orchestration (Kubernetes or managed alternatives).
  • CI/CD for services or ML workflows.
  • Monitoring/logging tools (Prometheus, Grafana, CloudWatch, Stackdriver, etc. ).
  • Collaboration and Communication:
  • Comfortable working in a remote, async-first environment: writing good design docs, giving structured written updates, and collaborating over Slack/email/tickets with distributed teams.

Nice-to-Have

  • Experience with real-time / low-latency systems, especially in ad tech, recommendation, ranking, or search.
  • Familiarity with feature stores and online feature serving.
  • Familiarity with online experimentation frameworks and traffic routing for AB tests.
  • Familiarity with Model registries and ML platforms (e. g., MLflow, SageMaker, Vertex AI pipelines).
  • Comfort reading Data Scientist code/notebooks and refactoring them into clean, production-ready modules.

This job was posted by Akshay Singh from Yugen.ai.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 134396737

Similar Jobs