Search by job, company or skills

AlgoLeap

Monitoring & Observability - Architect

10-12 Years
Save
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Overview

We are seeking an experienced ITOps Consultant – Monitoring & Observability to design, implement, and operate enterprise-grade monitoring solutions. This role focuses on ensuring high availability, performance, and reliability of IT infrastructure and applications through modern observability practices.

The ideal candidate will have 10+ years of hands-on experience in monitoring and observability platforms, with OpsRamp as a primary or preferred tool. Candidates with strong experience in Datadog or Dynatrace and proven capability to integrate monitoring tools with ITSM platforms are also encouraged to apply.

This is a rotational shift role , supporting 24x7 operations.

Primary Responsibilities

· Deploy, configure, and operate OpsRamp as the core monitoring platform, including onboarding devices, applications, and services.

· For non-OpsRamp profiles, quickly adapt and transition experience from tools like LGTM stack / Datadog / OpenText monitoring tools/NewRelic or any SAAS monitoring/observability tool into the OpsRamp ecosystem.

· Integrate monitoring platforms with IT Service Management (ITSM) tools (e.g., ServiceNow, BMC Remedy) for incident, event, and alert management.

· Develop and maintain dashboards, alerts, SLIs/SLOs, and reports to ensure proactive issue detection and faster incident resolution.

· Tune alert thresholds and correlation rules to reduce alert noise and improve signal quality.

· Support hybrid and multi-cloud environments, including on-prem, cloud, and containerized platforms.

· Collaborate with Infrastructure, Application, and DevOps teams to integrate monitoring into CI/CD pipelines and operational workflows.

· Automate monitoring, alerting, remediation, and reporting using scripts, APIs, and orchestration tools.

· Leverage AIOps capabilities for anomaly detection, event correlation, and predictive insights.

· Participate in rotational shifts and support operational monitoring, incident triage, and root cause analysis (RCA).

· Document monitoring architectures, runbooks, configurations, and standard operating procedures (SOPs).

Required Skills

Mandatory / Core Skills

  • 10+ years of experience in IT Operations, Monitoring, or Observability roles.
  • Hands-on experience with OpsRamp for monitoring deployment, configuration, and operations
    • OR strong experience with LGTM stack / Datadog / OpenText monitoring tools/NewRelic or any SAAS monitoring/observability tool , with readiness to work on OpsRamp.
  • Proven experience integrating monitoring tools with ITSM platforms.
  • Strong understanding of metrics, logs, traces, and observability best practices.
Technical Skills

  • Experience with open-source monitoring tools:
    • Prometheus, Grafana
    • ELK Stack (Elasticsearch, Logstash, Kibana), Fluentd
    • Tracing tools such as Jaeger or Zipkin
  • Working knowledge of REST APIs and API-based integrations.
  • Scripting/automation experience using Python, Ansible, or similar tools.
  • Familiarity with AIOps concepts, anomaly detection, and intelligent alerting.
  • Understanding of ITIL processes and service management frameworks.
  • Exposure to security monitoring and compliance requirements is a plus.
Soft Skills

  • Strong analytical and troubleshooting skills for complex production issues.
  • Ability to work effectively in 24x7 rotational shifts.
  • Good communication skills and ability to work with cross-functional teams and business stakeholders.
  • Ownership mindset with a focus on reliability and operational excellence.

Nice to Have

  • OpsRamp certification or hands-on production deployment experience
  • Experience monitoring Kubernetes / OpenShift environments

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 149363855