Search by job, company or skills

  • Posted 27 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

AIOps Architecture, IT Service Management (ITSM), Observability & Monitoring, Machine Learning, Site Reliability Engineering (SRE)

Description

GSPANN is hiring an AIOps Architect to design and lead enterprise AIOps architecture across observability, incident management, automation, and autonomous remediation. The role focuses on integrating Machine Learning, IT Service Management (ITSM) platforms, and Site Reliability Engineering (SRE) practices to build scalable, self-healing operational ecosystems.

Location: Hyderabad

Role Type: Full Time

Published On: 3 March 2026

Experience: 10+ Years

Share this job

Description

GSPANN is hiring an AIOps Architect to design and lead enterprise AIOps architecture across observability, incident management, automation, and autonomous remediation. The role focuses on integrating Machine Learning, IT Service Management (ITSM) platforms, and Site Reliability Engineering (SRE) practices to build scalable, self-healing operational ecosystems.

Role and Responsibilities

  • Design and implement end-to-end AIOps architecture covering observability, incident lifecycle management, anomaly detection, root cause analysis (RCA), and autonomous remediation.
  • Define and maintain AIOps strategy, reference architecture, governance frameworks, and operational blueprints.
  • Architect agentic and generative AI patterns for IT Operations, Data Operations, and Platform Operations.
  • Design unified observability frameworks spanning logs, metrics, traces, alerts, and events.
  • Build scalable event ingestion, correlation, and anomaly detection pipelines using ML and AI models.
  • Develop confidence-scored, controlled auto-remediation and auto-correction workflows.
  • Create orchestration layers to enable self-healing infrastructure, applications, and data pipelines.
  • Integrate runbooks, standard operating procedures (SOPs), and AI-driven virtual agents to automate L1 and L2 operations.
  • Integrate AIOps platforms with ITSM tools (ServiceNow, Jira), Configuration Management Database (CMDB), asset inventory systems, and enterprise observability stacks.
  • Collaborate with Data Engineering, Cloud, Infrastructure, Security, and Application teams to operationalize AIOps capabilities.
  • Align AIOps solutions with SRE principles, ITSM processes, and enterprise reliability objectives.
  • Serve as a trusted technical advisor to leadership, contributing to roadmaps, QBRs, and transformation programs.
  • Mentor engineering and operations teams on AIOps architecture, automation, and observability best practices.
  • Drive adoption through documentation, playbooks, governance models, and enablement initiatives.

Skills And Experience

  • 10+ years of experience in IT Operations, SRE, DevOps, or related domains, including 35 years architecting AIOps solutions.
  • Hold certifications in Cloud (AWS/Azure/GCP), SRE, ITIL, Observability platforms, or related disciplines (preferred).
  • Demonstrate strong hands-on experience with AIOps platforms, observability stacks, and monitoring ecosystems such as Prometheus, Grafana, Elastic, Dynatrace, or New Relic.
  • Possess experience operationalizing ML models through MLOps practices.
  • Apply deep understanding of logs, metrics, traces, event pipelines, and distributed system architectures.
  • Design and implement Machine Learning, Generative AI, and agent-based architectures for operational automation.
  • Build anomaly detection models, predictive alerting systems, and advanced RCA frameworks.
  • Develop event correlation engines and automated remediation workflows.
  • Possess strong expertise in infrastructure (compute, storage, networking), cloud platforms (AWS, Azure, GCP), and Kubernetes ecosystems.
  • Apply DevOps, CI/CD, SRE, and automation frameworks in enterprise environments.
  • Integrate AIOps capabilities with ITSM and CMDB platforms using enterprise integration patterns.
  • Design scalable, resilient, modular, and secure AIOps architectures.
  • Create reference architectures, governance models, and operational blueprints for enterprise adoption.
  • Apply data engineering principles, including ETL/ELT pipelines and data quality frameworks.
  • Lead engineering transformation initiatives and drive operational excellence across complex environments.
  • Demonstrate strong analytical thinking, problem-solving, and stakeholder management skills.

More Info

Job Type:
Industry:
Function:
Employment Type:

Job ID: 144030219

Similar Jobs