Job Description
Job Description Lead / SME Enterprise Monitoring
Role Overview
The Enterprise Monitoring Lead (SME) is responsible for end-to-end ownership of the monitoring platform, strategy, and operations. This role acts as the technical authority and escalation point (L3) while driving monitoring maturity, automation, and service reliability improvements.
The Lead ensures that monitoring is proactive, intelligent, and aligned to business servicesnot just tool-based alerting.
Key Responsibilities
Monitoring Strategy & Governance
- Define and implement enterprise monitoring strategy aligned to business services
- Ensure monitoring coverage across:
- Infrastructure (Server, Network)
- Applications (APM)
- Cloud environments (Azure/AWS)
- Establish monitoring standards, thresholds, and best practices
- Drive shift from alert-based monitoring service-centric monitoring
Technical Leadership & Escalation (L3)
- Act as highest escalation point for critical and complex incidents (P1/P2)
- Provide deep technical expertise across:
- Infrastructure, application, and network layers
- Performance and availability issues
- Lead troubleshooting during Major Incidents (MIM)
- Work closely with OEM/vendor support for complex issues
Monitoring Tools & Platform Ownership
- Own and manage enterprise monitoring tools:
- SCOM / SolarWinds / Dynatrace / AppDynamics / Azure Monitor (as applicable)
- Design and implement:
- Custom dashboards
- Alerting logic and correlation rules
- Synthetic and real-user monitoring
- Ensure tool stability, scalability, and integration with ITSM tools (ServiceNow)
Continuous Improvement & Optimization
- Drive alert noise reduction initiatives (target measurable % reduction)
- Identify recurring issues and lead Problem Management efforts
- Define and track monitoring KPIs (MTTR, MTTD, alert accuracy, etc.)
- Introduce AIOps / automation / predictive monitoring where applicable
Automation & Engineering
- Lead automation initiatives using:
- PowerShell / Python / Shell scripting
- Automate repetitive monitoring tasks, alert remediation, and reporting
- Integrate monitoring with CI/CD pipelines where applicable
Team Leadership & Capability Building
- Mentor and upskill L1 & L2 teams
- Define training plans and improve technical depth of the team
- Review incident handling quality and provide feedback
- Ensure adherence to SOPs, while encouraging analytical thinking
Stakeholder Management
- Act as primary technical contact for client stakeholders
- Provide insights through:
- Weekly/Monthly service reviews
- Monitoring health reports
- Improvement roadmaps
- Translate technical issues into business impact language
Shift & Coverage
- Provide support during critical incidents (on-call model)
- Ensure overall 24x7 monitoring operations are stable and efficient
Required Skills & Qualifications
Technical Expertise (Non-Negotiable)
- Strong hands-on experience across:
- Windows/Linux systems
- Network fundamentals and troubleshooting
- Application performance monitoring (APM tools like Dynatrace/AppDynamics)
- Deep expertise in at least one enterprise monitoring tool
- Strong understanding of:
- Cloud monitoring (Azure Monitor / AWS CloudWatch)
- Log analytics (Splunk, ELK, etc.)
Engineering & Automation
- Strong scripting skills:
- PowerShell / Python / Bash
- Experience with automation frameworks and APIs (preferred)
- Exposure to AIOps tools (good to have)
Process & Frameworks
- Strong ITIL knowledge:
- Incident, Problem, Event, Change Management
- Experience in driving Problem Management & RCA reviews
Leadership Skills
- Strong decision-making during high-pressure incidents
- Ability to lead technical discussions with confidence
- Coaching and mentoring mindset
- Structured communication (especially with global stakeholders)
Experience & Education
- 710+ years of experience in monitoring / NOC / infrastructure engineering
- 23 years in a lead or SME role
- Bachelor's degree in IT / Computer Science or related field
- ITIL certification (preferred)
- Relevant certifications (Azure/AWS/Dynatrace etc.) strong advantage
Qualifications
Graduation
Range Of Year Experience-Min Year
7
Range Of Year Experience-Max Year
10