Overview
We are seeking an experienced
ITOps Consultant – Monitoring & Observability to design, implement, and operate enterprise-grade monitoring solutions. This role focuses on ensuring
high availability, performance, and reliability of IT infrastructure and applications through modern observability practices.
The ideal candidate will have 10+
years of hands-on experience in monitoring and observability platforms, with
OpsRamp as a primary or preferred tool. Candidates with strong experience in
Datadog or Dynatrace and proven capability to integrate monitoring tools with
ITSM platforms are also encouraged to apply.
This is a rotational shift role , supporting 24x7 operations.
Primary Responsibilities
· Deploy, configure, and operate
OpsRamp as the core monitoring platform, including onboarding devices, applications, and services.
· For non-OpsRamp profiles, quickly adapt and transition experience from tools like LGTM stack / Datadog / OpenText monitoring tools/NewRelic or any SAAS monitoring/observability tool into the OpsRamp ecosystem.
· Integrate monitoring platforms with
IT Service Management (ITSM) tools (e.g., ServiceNow, BMC Remedy) for incident, event, and alert management.
· Develop and maintain
dashboards, alerts, SLIs/SLOs, and reports to ensure proactive issue detection and faster incident resolution.
· Tune alert thresholds and correlation rules to
reduce alert noise and improve signal quality.
· Support
hybrid and multi-cloud environments, including on-prem, cloud, and containerized platforms.
· Collaborate with Infrastructure, Application, and DevOps teams to integrate monitoring into
CI/CD pipelines and operational workflows.
· Automate monitoring, alerting, remediation, and reporting using
scripts, APIs, and orchestration tools.
· Leverage
AIOps capabilities for anomaly detection, event correlation, and predictive insights.
· Participate in
rotational shifts and support operational monitoring, incident triage, and root cause analysis (RCA).
· Document monitoring architectures, runbooks, configurations, and standard operating procedures (SOPs).
Required Skills
Mandatory / Core Skills
- 10+ years of experience in IT Operations, Monitoring, or Observability roles.
- Hands-on experience with OpsRamp for monitoring deployment, configuration, and operations
- OR strong experience with LGTM stack / Datadog / OpenText monitoring tools/NewRelic or any SAAS monitoring/observability tool , with readiness to work on OpsRamp.
- Proven experience integrating monitoring tools with ITSM platforms.
- Strong understanding of metrics, logs, traces, and observability best practices.
Technical Skills
- Experience with open-source monitoring tools:
- Prometheus, Grafana
- ELK Stack (Elasticsearch, Logstash, Kibana), Fluentd
- Tracing tools such as Jaeger or Zipkin
- Working knowledge of REST APIs and API-based integrations.
- Scripting/automation experience using Python, Ansible, or similar tools.
- Familiarity with AIOps concepts, anomaly detection, and intelligent alerting.
- Understanding of ITIL processes and service management frameworks.
- Exposure to security monitoring and compliance requirements is a plus.
Soft Skills
- Strong analytical and troubleshooting skills for complex production issues.
- Ability to work effectively in 24x7 rotational shifts.
- Good communication skills and ability to work with cross-functional teams and business stakeholders.
- Ownership mindset with a focus on reliability and operational excellence.
Nice to Have
- OpsRamp certification or hands-on production deployment experience
- Experience monitoring Kubernetes / OpenShift environments