
Search by job, company or skills
Job Location: Singapore (Onsite)
Job Summary:
We are looking for a GPU / AI Infrastructure Engineer with 5–7 years of experience to build, optimize, and support scalable AI/ML and HPC environments. The ideal candidate will have strong expertise in GPU acceleration, containerized workloads, and MLOps pipelines, along with hands-on experience managing AI infrastructure across on-prem or cloud platforms.
Key Responsibilities
· Design, deploy, and manage GPU-enabled infrastructure for AI/ML and HPC workloads.
· Install, configure, and optimize GPU software stacks including NVIDIA AI Enterprise, CUDA, ROCm, OpenCL, and NIMS.
· Support GPU acceleration for machine learning frameworks and scientific applications.
· Build and manage containerized environments using Docker, Kubernetes (K8s), and Singularity.
· Deploy and manage Kubernetes GPU workloads using GPU Operator and related ecosystem tools.
· Support ML frameworks such as TensorFlow, PyTorch, Scikit-learn, and MXNet.
· Develop and maintain MLOps pipelines using MLflow and Kubeflow.
· Design and implement Infrastructure as Code (IaC) solutions for AI/ML pipelines.
· Automate infrastructure provisioning using Terraform, Pulumi, and CloudFormation.
· Build and maintain CI/CD pipelines for ML model deployment and infrastructure automation.
· Collaborate with data scientists and engineers to optimize model performance and resource utilization.
· Monitor GPU utilization, system performance, and troubleshoot issues across the stack.
· Ensure scalability, reliability, and security of AI infrastructure environments.
Required Skills & Qualifications
· 5 years of experience in AI/ML infrastructure, HPC, or DevOps engineering roles.
· Strong experience with GPU technologies and acceleration frameworks (CUDA, ROCm, OpenCL).
· Hands-on experience with NVIDIA AI Enterprise stack and GPU ecosystem tools (e.g., NIMS, GPU Operator).
· Proficiency in container technologies: Docker, Kubernetes, and Singularity.
· Experience working with ML frameworks: TensorFlow, PyTorch, Scikit-learn, MXNet.
· Solid understanding of MLOps tools such as MLflow and Kubeflow.
· Expertise in Infrastructure as Code (Terraform, Pulumi, CloudFormation).
· Experience building and managing CI/CD pipelines for ML or infrastructure workflows.
· Strong scripting skills (Python, Bash, or similar).
· Familiarity with Linux-based environments.
TekWissen’s, Staffing division is a recruitment-centric organization focused on providing talent acquisition services (both IT and non-IT) in the Technology, Engineering, Clinical, Legal, Scientific, Finance, Marketing, Professional and Payroll Management arenas to clients across the US, and India. Founded in 2009, TekWissen is one of the fastest growing Staffing firm in United States. We have been recognized by Inc. 5000 fastest growing companies in USA with ranking # 192, #15 Top IT Service Company in 2014 by Inc.com, #6 Top Michigan Companies in 2014 again by Inc.com, Michigan 50 Companies to Watch in 2014, FastTrack Award for 2014.
Job ID: 147076237