
Search by job, company or skills

KEY RESPONSIBILITIES
• Assist in managing and monitoring GPU cluster health and infrastructure stability
• Support the configuration of compute resources and container orchestration using Kubernetes and Docker
• Help maintain data pipelines for AI model training and inference workloads
• Monitor system performance using tools like Prometheus and Grafana
• Assist in troubleshooting infrastructure issues and documenting resolutions
• Support the team in implementing Infrastructure as Code (IaC) using Terraform or Ansible
• Participate in data preprocessing and storage system maintenance
• Collaborate with ML engineers to understand and support their infrastructure needs
• Learn and apply best practices for cloud architecture (AWS, GCP, or Azure)
• Contribute to technical documentation and knowledge base development
REQUIRED QUALIFICATION
S• Bachelor's degree in Computer Science, Information Technology, or related fiel
d• 0-10 years of experience in DevOps, cloud infrastructure, or related role
s• Basic understanding of Linux systems and command-line operation
s• Familiarity with Python or other scripting language
s• Exposure to containerization technologies (Docker, Kubernetes
)• Basic knowledge of cloud platforms (AWS, GCP, or Azure
)• Strong analytical and problem-solving skill
s• Excellent communication and teamwork abilitie
s• Eagerness to learn and adapt to new technologie
s
PREFERRED QUALIFICATIO
NS• Experience with GPU computing or HPC environmen
ts• Familiarity with machine learning concepts and workflo
ws• Knowledge of monitoring tools (Prometheus, Grafan
a)• Understanding of networking fundamenta
ls• Relevant certifications in cloud platforms or Kubernet
esJob ID: 148905237
We don’t charge any money for job offers