Search by job, company or skills

  • Posted 8 hours ago
  • Be among the first 10 applicants
Early Applicant
Quick Apply

Job Description

We are seeking a skilled and experienced Platform Engineer/Architect to lead the setup, advancement and maintenance of a robust on-premise environment for hosting open-source large language models. This role involves designing and implementing scalable, secure, and efficient infrastructure solutions that cater to the specific needs of large-scale AI models.

HOW YOU WILL CONTRIBUTE AND WHAT YOU WILL LEARN

  • Design and architect a scalable and secure on-premise hosting environment for large language models.
  • Develop and implement infrastructure automation tools for efficient management and deployment.
  • Ensure high availability and disaster recovery capabilities.
  • Optimize the hosting environment for maximum performance and efficiency.
  • Implement monitoring tools to track system performance and resource utilization.
  • Regularly update the infrastructure to incorporate the latest technological advancements.
  • Establish robust security protocols to protect sensitive data and model integrity.
  • Ensure compliance with data protection regulations and industry standards.
  • Conduct regular security audits and vulnerability assessments.
  • Work closely with AI/ML teams to understand their requirements and provide suitable infrastructure solutions.
  • Provide technical guidance and support to internal teams and stakeholders.
  • Stay abreast of emerging trends in AI infrastructure and large language model hosting.
  • Manage physical and virtual resources to ensure optimal allocation and utilization.
  • Forecast resource needs and plan for future expansion and upgrades

KEY SKILLS AND EXPERIENCE

  • Bachelor's or Master's degree in Computer Science, Information Technology, or a related field with 7-12 years of experience.
  • Proven experience in infrastructure architecture, with exposure to AI/ML environments.
  • Experience with inferencing frameworks like TGI, TEI, Lorax, S-Lora etc.
  • Experience with training frameworks like PyTorch, TensorFlow etc.
  • Proven experience with On-premises OSS models – Llama3, Mistral etc.
  • Strong knowledge of networking, storage, and computing technologies.
  • Experience of working with container orchestration tools (e.g., Kubernetes - Redhat OS).
  • Proficient programming skills in Python
  • Familiarity with open-source large language models and their hosting requirements.
  • Excellent problem-solving and analytical skills.
  • Strong communication and collaboration abilities.

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Indian

Job ID: 106993937

Similar Jobs