We are seeking to add a Command Centre & User Experience Monitoring – Senior Analyst to our growing talent pool. The ideal candidate will have 3+ years of hands-on experience in Command Centre operations, synthetic monitoring, and application/infrastructure support—preferably within the e‑commerce domain.
Job Summary: As a Senior Analyst, you will play a critical role in monitoring, analyzing, and ensuring the stability and performance of key IT systems and business‑critical applications. You will operate within a 24×7 Command Centre environment, responding to incidents, performing technical troubleshooting, and establishing end‑to‑end monitoring across APM tools, and escalating issues appropriately to minimize downtime and ensure seamless service availability.
The role requires strong communication and analytical skills, with the ability to listen, absorb information, and relay insights clearly to stakeholders. You will be expected to identify improvement opportunities, articulate concerns, and collaborate effectively across teams while adhering to defined SLAs and quality standards. A proactive mind-set, ownership mentality, and willingness to take on responsibilities beyond the immediate scope of work are strong plus points. The candidate should also be open to working in rotational shifts and providing on‑call support when required.
Roles and Responsibilities:
- Monitor IT systems, applications, and infrastructure to ensure high availability, performance, and stability.
- Serve as the gatekeeper for all Incident queues, ensuring efficient governance, prioritization, and flow management.
- Respond proactively to system alerts, events, and incidents by performing initial triage, analysis, and timely resolution.
- Lead technical troubleshooting calls and Major Incident Management (MIM) bridges to drive quick containment and recovery.
- Escalate incidents to relevant resolver groups based on impact, urgency, and predefined workflows.
- Document incidents, actions taken, root findings, and resolutions in a clear, accurate, and audit‑friendly manner.
- Communicate effectively with internal IT teams, cross-functional stakeholders, vendors, and management during active incidents.
- Provide status updates, impact assessments, and risk visibility throughout the incident lifecycle.
- Participate in enhancement and rollout of monitoring tools, dashboards, and operational procedures.
- Expand monitoring coverage across platforms using tools such as Dynatrace, GlassBox, Splunk, and others.
- Utilize specialized tools to identify and analyze customer experience issues on the live site/production environment.
- Identify system gaps, recurring issues, and optimization opportunities to improve IT reliability and operational efficiency.
- Work closely with managers to streamline incident-handling processes and improve key KPIs such as MTTD, MTTI, and MTTR.
- Prepare daily and weekly reports covering alerts, incident trends, and platform health across dashboards.
- Maintain up‑to‑date SOPs, knowledge base articles, documentation, and reference guides for all monitored systems and processes.
- Lead the team in enhancing skills related to monitoring, environment understanding, incident handling, and service request execution.
- Take ownership of documentation for new projects, obtain first-hand knowledge through training, and refine/transfer knowledge to the broader team.
Technical and Functional Skills:
Bachelor's Degree with 3+ years of hands-on experience in incident handling, platform monitoring, troubleshooting, APM tools, and reporting. Strong platform monitoring and troubleshooting expertise is a core requirement.
Solid understanding of ITSM processes, including Incident Management, Change Management & Problem Management. ITIL certification is an added advantage.
Strong application server knowledge required for effective technical troubleshooting across distributed environments.
Experience supporting eCommerce applications such as Cart, Payments, Checkout, Order orchestration flows
Advanced expertise in Dynatrace, including:
- Dynatrace environment setup and configuration
- ActiveGate deployment and management
- Working with Extensions 2.0, Smartscape topology, and Davis AI
- Designing and implementing monitoring for complex integrations (REST/SOAP APIs, file-based interfaces, scheduled jobs, orchestrations)
- Configuring Davis AI–driven alerting and anomaly detection for proactive issue identification
- Building executive-level dashboards and detailed technical dashboards (e.g., integration flow performance, latency metrics, error patterns)
- Deep-dive analysis of failed instances—reviewing HTTP status codes, payloads, connector faults, and performance bottlenecks
Proficiency in MS Office tools, especially:
- MS Excel (advanced reporting, pivot tables, analytics)
- MS PowerPoint (executive-ready presentations and performance trend reporting)
Good to have Skills:
· Experience in designing and building automation solutions to enhance Command Center Operations efficiency.