
Search by job, company or skills
Description:
We are looking for an experienced Network Automation Engineer to design, implement, and optimize automation solutions for our Private Cloud datacenter network, which underpins large-scale AI/ML GPU and TPU workloads. This role focuses on automating configuration, provisioning, and monitoring of high-performance networking devices to ensure low latency, high throughput, and reliability in a mission-critical environment. This role involves automating network device management as well as OS-level network configurations on servers. Expertise in Ansible and Python is essential, and experience with GoLang is a strong plus.
Key Responsibilities:
Develop and maintain network automation frameworks for large-scale datacenter environments supporting AI/ML workloads.
Build Ansible playbooks, roles, and modules to automate device configurations, software upgrades, and compliance checks across multi-vendor environments.
Design and implement Python-based automation scripts and tools to integrate with APIs, orchestration platforms, and monitoring systems.
Automate OS core networking configurations on servers (Linux / Windows / Hypervisor) including bonding, VLANs, routing tables, kernel network parameters, MTU tuning, and NIC performance optimization.
Collaborate with cloud infrastructure, network engineering, and DevOps teams to deliver seamless provisioning and scaling of GPU/TPU clusters.
Ensure network automation solutions meet high-performance computing (HPC) requirements such as low latency, high throughput, and fault tolerance.
Participate in network architecture reviews to provide automation insights and recommendations.
Document automation processes, workflows, and operational guidelines for the datacenter network.
Stay updated on emerging technologies in network automation, SDN, and private cloud networking.
Required Skills & Experience:
Expertise in Ansible (playbook development, dynamic inventory, custom modules) for large-scale network automation.
Strong proficiency in Python for scripting, API integrations (REST, NETCONF, gNMI), and device interaction (e.g., NAPALM, Netmiko, Paramiko).
Hands-on experience with high-performance datacenter networking devices (Cisco Nexus, Arista, Juniper, Mellanox/NVIDIA Networking).
Knowledge of Linux / Windows / Hypervisor OS core networking, including:
Deep understanding of networking concepts including BGP, EVPN-VXLAN, MPLS, QoS, and leaf-spine architectures.
Experience in Private Cloud environments with a focus on supporting HPC/AI workloads.
Familiarity with CI/CD pipelines (GitLab, Jenkins) for deploying automation at scale.
Knowledge of network observability, telemetry, and streaming protocols (gRPC, sFlow, SNMP, InfluxDB, Prometheus).
Strong problem-solving skills and ability to operate in a high-availability, mission-critical datacenter environment.
Good to Have:
GoLang experience for building scalable and high-performance automation tools.
Familiarity with Infrastructure-as-Code (IaC) tools like Terraform or Pulumi.
Exposure to Kubernetes networking (CNI plugins) and containerized workloads.
Understanding of AI/ML workload characteristics and their impact on network design and performance.
LTIMindtree is a global technology consulting and digital solutions LTIMindtree company that enables enterprises across industries to reimagine business models, accelerate innovation, and maximize growth by harnessing digital technologies. As a digital transformation partner to more than 750 clients, LTIMindtree brings extensive domain and technology expertise to help drive superior competitive differentiation, customer experiences, and business outcomes in a converging world. Powered by more than 90,000 talented and entrepreneurial professionals across 30 countries, LTIMindtree — a Larsen & Toubro Group company — combines the industry-acclaimed strengths of erstwhile L&T Infotech and Mindtree in solving the most complex business challenges and delivering transformation at scale.
For more, please visit www.ltimindtree.com.
Job ID: 130712883