Search by job, company or skills

People Prime Worldwide

Senior Infrastructure Automation Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 12 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description:

Job Description:

Senior Infrastructure Automation Engineer (Zero-Touch GPU Cloud Build & Upgrade)

We are looking for a Senior Infrastructure Automation Engineer with 10+ years of hands on experience in building and scaling infrastructure automation systems to lead the design and implementation of a Zero-Touch Build, Upgrade, and Certification framework for our on-prem GPU cloud environment. This role demands deep technical expertise across bare-metal provisioning, configuration management, and full-stack automationfrom hardware to Kubernetesbuilt entirely on GitOps principles.

Key Responsibilities

  • Architect, lead, and implement a fully automated, zero-touch deployment pipeline for GPU cloud infrastructure spanning hardware OS Kubernetes platform layers.
  • Build robust GitOps-based workflows to manage end-to-end infrastructure lifecyclefrom provisioning to continuous compliance.
  • Design and maintain automation for:
  • Bare-metal control: Power cycling, provisioning, remote installs
  • Firmware and configuration flashing: BIOS, NIC, RAID, etc.
  • Hardware inventory management
  • Configuration drift detection and remediation
  • Develop and extend internal automation frameworks using Ansible, Python, and related infrastructure tooling.
  • Serve as a technical authority and mentor, guiding junior engineers and collaborating cross-functionally with hardware, SRE, and platform engineering teams.
  • Lead architectural and design reviews for infrastructure automation systems.
  • Define and implement best practices for infrastructure as code, compliance, and operational resilience.
  • Champion automation-driven operational models and reduce manual intervention to near-zero.
  • Bonus: Familiarity with Terraform, Chef, and Cloud Automation Platforms.

Required Skills & Experience

  • 10+ years of hands-on experience in infrastructure engineering, automation, and systems design, with a strong track record of delivering scalable and maintainable solutions.
  • Primary key skills required are Ansible, Python, ipmitool, firmware scripting, Linux shell scripting
  • Deep expertise in:
  • Ansible for automation and configuration management
  • Python for scripting, integration, and automation logic
  • ipmitool and related tools for low-level hardware management (e.g., IPMI, Redfish)
  • Proven experience with bare-metal automation in data center environments, including:
  • Power control and PXE booting
  • BIOS/NIC/RAID firmware upgrades
  • Hardware and platform inventory systems
  • Strong foundation in Linux systems, networking, and Kubernetes infrastructure.
  • Fluency with GitOps workflows and tools.
  • Experience with CI/CD systems and managing Git-based pipelines for infrastructure.
  • Familiarity with infrastructure monitoring, logging, and drift detection.
  • Strong cross-team collaboration and communication skills, especially across hardware, platform, and SRE teams.
  • Bonus:
  • Prior leadership or mentorship roles
  • Experience contributing to or maintaining open-source infrastructure projects
  • Exposure to GPU-based compute stacks and high-performance workloads

More Info

Job Type:
Industry:
Employment Type:

Job ID: 144846417