Search by job, company or skills

  • Posted 23 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Data Engineer / ML Engineer Job Description

Location - Gurugram (Onsite)

Salary Budget - Upto 18 LPA

Key Responsibilities

  • Design, build, and maintain scalable data pipelines (batch + streaming) using Spark, Hadoop, and other Apache ecosystem tools.
  • Develop robust ETL workflows for large-scale data ingestion, transformation, and validation.
  • Work with Cassandra, Data Lakes, and distributed storage systems to handle large-volume datasets.
  • Write clean, optimized, and modular Python code for data processing, automation, and machine learning tasks.
  • Utilize Linux environments for scripting, performance tuning, and data workflow orchestration.
  • Build and manage web scraping pipelines to extract structured and unstructured data from diverse sources.
  • Collaborate with ML/AI teams to prepare training datasets, manage feature stores, and support model lifecycle.
  • Implement and experiment with LLMs, LangChain, RAG pipelines, and vector database integrations.
  • Assist in fine-tuning models, evaluating model performance, and deploying ML models into production.
  • Optimize data workflows for performance, scalability, and fault tolerance.
  • Document data flows, transformation logic, and machine learning processes.
  • Work cross-functionally with engineering, product, and DevOps teams to ensure reliable, production-grade data systems.

Requirements

  • 36 years of experience as a Data Engineer, ML Engineer, or similar role.
  • Strong expertise in Advanced Python (data structures, multiprocessing, async, clean architecture).
  • Solid experience with:
  • Apache Spark / PySpark
  • Hadoop ecosystem (HDFS, Hive, Yarn, HBase, etc.)
  • Cassandra or similar distributed databases
  • Linux (CLI tools, shell scripting, environment management)
  • Proven ability to design and implement ETL pipelines and scalable data processing systems.
  • Hands-on experience with data lakes, large-scale storage, and distributed systems.
  • Experience with web scraping frameworks (BeautifulSoup, Scrapy, Playwright, etc.).
  • Familiarity with LangChain, LLMs, RAG, vector stores (FAISS, Pinecone, Milvus), and ML workflow tools.
  • Understanding of model training, fine-tuning, and evaluation workflows.
  • Strong problem-solving skills, ability to deep dive into complex data issues, and write production-ready code.
  • Experience with cloud environments (AWS/GCP/Azure) is a plus.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 136915505

Similar Jobs