Search by job, company or skills

A

Infrastructure Architect

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Project Role : Infrastructure Architect

Project Role Description : Lead the definition, design and documentation of technical environments. Deploy solution architectures, conduct analysis of alternative architectures, create architectural standards, define processes to ensure conformance with standards, institute solution-testing criteria, define a solutions cost of ownership, and promote a clear and consistent business vision through technical architectures.

Must have skills : Infrastructure Automation

Good to have skills : NA

Minimum 12 Year(s) Of Experience Is Required

Educational Qualification : 15 years full time education

Summary

As an Infrastructure Architect, a typical day involves leading the design and documentation of complex technical environments that support organizational goals. This role requires deploying solution architectures and evaluating various architectural alternatives to determine the most effective approach. The professional ensures that architectural standards are clearly defined and adhered to, while establishing processes that maintain consistency and quality across projects. Additionally, the role includes setting criteria for solution testing and assessing the total cost of ownership for proposed solutions. Throughout the day, the Infrastructure Architect works to align technical strategies with the broader business vision, fostering clarity and coherence in architectural decisions.

Key Responsibilities

Design and implement HPC and AI infrastructure solutions, aligning system architecture and deployment roadmaps to industry-specific performance and scalability needs

Deploy, configure, and manage XPU-based clusters (CPU/GPU/accelerators) using schedulers, VM/K8s orchestration platforms, Slurm, and containerized platforms in scalable designs to provide Metal as a Service (MaaS), GPUaaS, AIaaS, and other offerings

Optimize cluster performance, scalability, energy, and cost efficiency across on-premises, cloud, and hybrid environments

Integrate AI and HPC platforms with existing IT systems, data pipelines, and security frameworks

Monitor, troubleshoot, and tune infrastructure to ensure high availability, low-latency networking, and workload resiliency

Develop and maintain documentation including architecture diagrams, configuration baselines, and operational runbooks

Provide Provide technical guidance and support to users, enabling efficient execution of HPC/AI workloads, large-scale models, and simulations.

Required Skills And Qualifications

Proven ability to advise and engage with C-Suite executives and senior leadership, translating complex AI and HPC technologies into business and strategic value

Deep knowledge of infrastructure components including XPUs, high-performance fabrics (InfiniBand, Ethernet), and modern storage/data platforms (e.g. NVMe-oF, Lustre, BeeGFS, VAST, DDN, Weka)

Familiarity with orchestration and management frameworks (Slurm, Kubernetes, Docker) and performance/monitoring tools for AI/HPC environments

Strong grasp of MLOps, DevSecOps, and automation principles (Terraform, Ansible) as they apply to large-scale, secure, and reproducible workflows Experience in AgenticAI based automation developing and integrating agents for automation and observability

Excellent communication and client-facing skills, with the ability to present complex architectures to both executives and technical teams.

Preferred Skills And Qualifications

Understanding of cloud and virtualization platforms (AWS, Azure, GCP, VMware, Nutanix) and how to align them with AI/HPC workload requirements

Experience advising or overseeing large-scale AI/HPC deployments (1,000+ GPUs or clusters of 100+ servers), providing architecture and strategic guidance

Familiarity with GPU computing and accelerator ecosystems (NVIDIA CUDA, AMD ROCm) and integration considerations for HPC/AI workloads

Knowledge of AI/ML frameworks (TensorFlow, PyTorch) and their operational and performance implications in HPC/AI environments

Industry experience in Life Sciences, Resources, Automotive, Financial Services, Telecommunications, or other HPC/AI-intensive sectors

Relevant cloud or infrastructure certifications (e.g., AWS Solutions Architect, GCP Professional Data Engineer) or equivalent technical credentials

Experience in workload planning, optimization, and orchestration guidance to align infrastructure with business and research objectives

Demonstrated ability to develop roadmaps, ROI analysis, and architecture recommendations that balance performance, scalability, and cost efficiency

Additional Information

  • The candidate should have minimum 12 years of experience in Infrastructure Automation.
  • This position is based at our Gurugram office.
  • A 15 years full time education is required.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147367479

Similar Jobs

Noida, India

Skills:

Azure MLDevopsSaasAnsibleIaasMLopsAWSCloudformationAzureGcpTerraformAirflowVertex AIMLflowKubeflowData ConsolidationData LakesSageMakerLakehousesData Pipelines

Gurugram, Gurugram, India

Skills:

WanDnsFirewallsDHCPVpnDockerAnsibleLanLoad BalancingAzureKubernetesAWSinfrastructure automationconfiguration management toolsnetwork security controlsSD-WAN