WFH- US Shift Timing
Salary: 25-30 lpa
Joining: Within 30 days
Job Description
The LLM Framework Engineer designs, develops, and deploys scalable AI/ML and GenAI solutions focused on Large Language Models (LLMs). This hands-on role builds robust frameworks for model development and evaluation, leveraging tools like Langflow for agentic and RAG applications, and Datadog for LLM observability. The engineer collaborates with business partners, data scientists, and product teams to deliver innovative, reliable, and compliant AI solutions that drive business value
Expectations
- Model Development & Delivery
- Lead the design, implementation, and deployment of LLM-powered applications, taking models from concept to production.
- Build and maintain frameworks for prompt engineering, model evaluation, and workflow orchestration (e.g., Langflow, LangChain).
- Ensure rigorous testing, validation, and quality control of generative AI models, using tools such asDeepEval.
- Observability & Monitoring
- Integrate Datadog LLM Observability to monitor, troubleshoot, and optimize LLM application performance, cost, and reliability.
- Instrument applications for end-to-end tracing, token usage, latency, error detection, and operational metrics.
- Build dashboards and alerts for model drift, performance anomalies, and GenAI metrics.
- Research & Continuous Improvement
- Stay current with generative AI techniques, frameworks, and open-source libraries (e.g., Hugging Face, TensorFlow, PyTorch).
- Lead and participate in research activities to advance LLM capabilities and best practices.
- Collaboration & Stakeholder Engagement
- Work closely with cross-functional teams to integrate LLM technologies into products and services.
- Engage business stakeholders to identify opportunities for innovation and value creation.
- Engineering Excellence
- Drive adoption of Agile, DevOps, and CI/CD practices for reliable, scalable delivery.
- Advocate for automated testing, infrastructure as code, and continuous monitoring.
- Conduct code reviews and enforce secure coding practices.
Skills Set
Technical Skills
- Strong hands-on experience in machine learning and GenAI, delivering complex solutions to production.
- Deep expertise in LLM frameworks and orchestration tools: Langflow, LangChain, MLFlow, DeepEval.
- Experience with Datadog LLM Observability for monitoring, tracing, and evaluating LLM applications.
- Proficiency in Python software development, following object-oriented design patterns and best practices.
- Experience with deep learning frameworks: TensorFlow, PyTorch.
- Familiarity with Hugging Face Transformers, Spacy, Pandas, and other open-source libraries.
- Experience with Docker, Kubernetes, and cloud platforms (AWS, Azure, GCP).
- Knowledge of Postgres, VectorDBs, and scalable data architectures.
- Experience with evaluation and scoring frameworks (MLFlow, DeepEval).
- Strong understanding of MLOps, CI/CD, and infrastructure as code.
- Observability & Monitoring
- Hands-on experience instrumenting LLM applications for observability using Datadog, Langflow,