Senior Support Engineer

Infinite Computer Solutions

Bengaluru, India

3-5 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Description

Job Description L2 Enterprise Monitoring Engineer

Role Overview

The L2 Enterprise Monitoring Engineer is responsible for advanced monitoring, incident analysis, and troubleshooting across infrastructure, applications, and network layers. This role acts as the primary resolver group for monitoring-triggered incidents and plays a key role in reducing alert noise, improving monitoring effectiveness, and driving faster resolution.

L2 engineers are expected to go beyond SOPsanalyze, fix, and improve.

Key Responsibilities

Advanced Monitoring & Event Analysis

Perform deep analysis of alerts generated from enterprise monitoring tools (SolarWinds, SCOM, Dynatrace, etc.)
Correlate multiple alerts/events to identify underlying issues (avoid symptom-based handling)
Fine-tune alert thresholds and suppress false positives
Identify gaps in monitoring coverage and recommend improvements

Incident Troubleshooting & Resolution

Take ownership of P2/P3 incidents and support P1 (Major Incidents)
Perform detailed troubleshooting across:
Servers (Windows/Linux)
Network (connectivity, latency, packet loss)
Applications (availability, performance)
Execute standard fixes, workarounds, and recovery actions
Engage L3/OEM vendors when required with proper diagnostics

Major Incident Support (MIM)

Support Major Incident calls by providing technical insights and updates
Perform real-time troubleshooting and log analysis during outages
Ensure quick identification of root cause or workaround
Provide inputs for incident timelines and updates

Automation & Monitoring Optimization

Create and enhance monitoring scripts, thresholds, and alert logic
Automate repetitive tasks using scripting (PowerShell / Shell / Python basic level)
Drive reduction in alert noise and manual effort
Contribute to continuous improvement initiatives

Knowledge Management & Documentation

Create and update Knowledge Base (KB) articles and runbooks
Document known errors and workarounds
Ensure troubleshooting steps are reusable by L1 team

Collaboration & Escalation

Act as technical escalation point for L1 team
Guide L1 analysts on triage and handling improvements
Coordinate with cross-functional teams (Infra, App, Network, Cloud)
Ensure proper escalation to L3 with complete diagnostics

Shift & Operations

Participate in 24x7 rotational shifts (including weekends/on-call if applicable)
Ensure high-quality shift handovers with actionable insights

Required Skills & Qualifications

Technical Skills (Core Expectation)

Strong hands-on experience in:
Windows & Linux server administration
Network fundamentals (DNS, TCP/IP, routing basics)
Application monitoring concepts (APM tools like Dynatrace/AppDynamics preferred)
Strong working knowledge of monitoring tools:
SCOM / SolarWinds / Dynatrace / Nagios / Zabbix
Log analysis skills (Event Viewer, syslogs, basic Splunk/Kibana exposure preferred)
Basic scripting skills:
PowerShell / Bash / Python (any one)

Process & Frameworks

Strong understanding of ITIL:
Incident Management
Event Management
Problem Management (basic involvement)

Soft Skills (Non-Negotiable)

Strong communicationclear, structured, and confident (especially with US stakeholders)
Analytical thinking (must move beyond checklist-based work)
Ownership mindsetdrives issues to closure
Ability to work under pressure during incidents

Experience & Education