Primary Title: Generative AI Engineer On-site (India)
Industry: Enterprise AI / Machine Learning services for product and platform engineering. Sector: GenAI-driven solutions building production-grade large language model (LLM) applications, retrieval-augmented systems, and AI-native APIs for enterprise customers.
We are hiring an on-site Generative AI Engineer to design, fine-tune, deploy and maintain scalable GenAI services that power intelligent search, assistants, summarization and automation. This role suits engineers who move fast from prototype to production and own model reliability, inference performance and data privacy at scale.
Role & Responsibilities
- Architect, fine-tune and deploy LLM-based solutions (RAG, summarization, chat assistants) from prototype to production, ensuring performance, cost-efficiency and safety.
- Build data ingestion and preprocessing pipelines for text, embeddings and context windows; implement vectorization and retrieval workflows for RAG pipelines.
- Implement prompt engineering, evaluation frameworks and automated model validation to measure accuracy, hallucination rates and response quality.
- Optimize inference throughput and latency through quantization, batching, ONNX export, sharding and accelerated runtimes for GPU/CPU environments.
- Integrate LLMs with backend APIs, LangChain pipelines and vector databases; implement secure data flows, access controls and PII-handling policies.
- Define MLOps best practices: model versioning, CI/CD for models, monitoring, alerting and reproducible training & deployment pipelines.
Skills & Qualifications
Must-Have
- Proven experience building and deploying production GenAI/LLM solutions; hands-on with model fine-tuning and evaluation.
- Strong proficiency in Python for ML engineering and production services.
- Practical experience with PyTorch and HuggingFace Transformers.
- Hands-on with LangChain (or equivalent orchestration frameworks) and vector search architectures (FAISS / Milvus / Weaviate).
- Containerization and deployment experience (Docker) and familiarity with MLOps patterns (CI/CD, model versioning, monitoring).
- Solid understanding of retrieval-augmented generation (RAG), prompt engineering and embedding strategies.
Preferred
- Experience with Kubernetes for scaling inference workloads and GPU orchestration.
- Familiarity with cloud ML infrastructure (AWS, GCP or Azure) and serverless inference patterns.
- Knowledge of model optimization tools and formats (ONNX, quantization, model distillation) and observability stacks for ML.
Benefits & Culture Highlights
- Hands-on ownership of end-to-end GenAI products and direct impact on roadmap and architecture decisions.
- Collaborative, on-site engineering culture focused on mentorship, technical excellence and continuous learning.
- Opportunities to work with cutting-edge LLMs, production MLOps tooling and enterprise integrations.
Location: On-site, India. We seek passionate engineers who can translate state-of-the-art generative AI research into robust, secure and low-latency production services. Apply if you thrive on building scalable LLM systems and delivering measurable enterprise impact.
Skills: ml,generative ai,gcp,ai/ml,aws,docker,python,kubernetes