
Search by job, company or skills
Inviting applications for the role of Principal consultant - AI/ML engineer
In this role, you will be responsible for supporting the network operations 24X7 for internal IT. You will be working actively on the operational issues along with internal / external stakeholders to recover the systems failure (proactive/reactive).
You will work Incident Management, Problem Management for each incidents/tickets and need to ensure the high uptime to the business.
Your role will network L2+ and need to work on various current/upcoming technologies (explained below) to support business needs.
Responsibilities
The candidate should be holding good experience of hardcore 24X7 network Ops experience
Candidate will work at the intersection of AI engineering, infrastructure automation, and platform engineering, enabling autonomous decision-making across Compliance, Multi-Cloud, Network DevOps, EUC, and Identity Management.
1. Agentic AI & Automation Development
Design and implement agent-based AI systems for infrastructure operations (planning, reasoning, execution loops)
Build autonomous remediation workflows for incidents, alerts, and service degradation (HITL, HOTL, HOOTL)
Develop multi-agent orchestration frameworks for cross-domain infrastructure coordination
Integrate LLMs with enterprise systems for context-aware decision-making
2. Infrastructure Automation (Domain-Specific)
Continuously monitor policy adherence, detect drift, and enable automated remediation of compliance violations.
Orchestrate MACD, optimization, and cost governance across AWS, Azure, and GCP using IaC and AI agents.
Automate provisioning, validation, and AI-driven troubleshooting across hybrid networks using telemetry and APIs.
Build self-healing automation for endpoint performance, application issues, and device lifecycle management.
Automate access provisioning, enforce policies, and detect/respond to identity anomalies.
3. AI/ML & LLM Engineering
Develop and fine-tune LLM-powered agents using frameworks like CrewAI, Langchain
Build context-aware reasoning systems using telemetry, logs, and configuration data
Ensure guardrails, explainability, and human-in-the-loop (HITL) controls
4. Observability & Telemetry Integration
Integrate with observability platforms
Correlate logs, metrics, traces for AI-driven insights and decisioning
Build real-time telemetry pipelines for agent consumption
5. Platform & Engineering Integration
Embed agents into CI/CD pipelines, ITSM workflows, and platform engineering stacks
Integrate with ServiceNow, GitOps workflows
Develop reusable automation modules and APIs
6. Governance, Safety & Reliability
Implement AI safety, compliance, and governance frameworks
Define levels of autonomy (HITL, HOTL, HOOTL)
Ensure auditability and traceability of agent actions
Core Technical Skills
Strong programming: Python (mandatory), plus PowerShell / JavaScript
Experience with AI/LLM frameworks (CrewAI, Langchain)
Hands-on with REST APIs, microservices, and event-driven architectures
CI/CD pipelines (GitHub Actions, Azure DevOps)
Infrastructure & Automation
Deep knowledge in at least 2 domains:
Network (SD-WAN, routing, firewalls, proxies)
EUC / Desktop Engineering
Cloud (AWS / Azure / GCP)
AI Engineering
Experience with:
LLMs, RAG pipelines, embeddings, vector DBs
Prompt engineering and agent orchestration
Understanding of:
Autonomous systems design patterns
Planning, reasoning, and tool-use agents
Observability & Data
Familiarity with:
Logging, monitoring, tracing systems
Data pipelines (Kafka, Kinesis)
Qualifications we seek in you!
Minimum Qualifications
Experience building agentic AI systems for IT operations (AIOps)
Exposure to FinOps, SecOps, or compliance automation
Knowledge of Zero Trust architecture
Certifications in cloud (AWS/Azure/GCP) or networking (Cisco, Palo Alto)
Experience in large-scale enterprise environments
Preferred Qualifications/ Skills
Systems thinking across infrastructure layers
Strong problem-solving and debugging skills
Ability to design autonomous, resilient systems
Collaboration with cross-functional teams (Infra, Security, DevOps, AI)
Experience with on-prem / edge AI deployments
Knowledge of graph-based reasoning (knowledge graphs) for infra mapping
Genpact (NYSE: G) is a global professional services and solutions firm delivering outcomes that shape the future. Our 125,000+ people across 30+ countries are driven by our innate curiosity, entrepreneurial agility, and desire to create lasting value for clients. Powered by our purpose - the relentless pursuit of a world that works better for people - we serve and transform leading enterprises, including the Fortune Global 500, with our deep business and industry knowledge, digital operations services, and expertise in data, technology, and AI.
Job ID: 145816367