Team Leader - NOC

Check Point Software

Bengaluru, India

3-12 Years

Save

Posted a month ago
Be among the first 20 applicants

Early Applicant

Job Description

Why Join Us

The NOC Team Leader will oversee 24/7 infrastructure, application services, and production systems, driving high availability, alert response, batch job monitoring, and cross-team collaboration for uninterrupted SaaS/web services. This senior role demands strategic leadership in a fast-paced, multi-shift environment with emphasis on troubleshooting, metrics reporting, and process optimization.

Key Responsibilities

Major Responsibilities

Team Leadership & Management
Lead, mentor, and support a team of NOC engineers across all shifts, guiding them in monitoring production systems, applications, batch jobs, and diagnostic tools.
Set priorities, distribute tasks, ensure proper workload balance, and track issues through first-level analysis to closure with IT teams.
Drive professional development through training, coaching, ongoing feedback, and contributions to knowledge base articles, process documents, and playbooks.
Conduct periodic 1:1 meetings, performance evaluations, and goal-setting.
Recruit, onboard, and integrate new NOC engineers into the team.
Build and maintain a culture of accountability, high performance, service quality, and proactive collaboration with Applications, Systems, Database, and Network teams.
Operational Oversight
Own the day-to-day operations of the entire NOC function, ensuring consistent monitoring, alert handling, batch job troubleshooting, operational routine execution, and impact assessment on production schedules.
Ensure all teams consistently follow predefined procedures, escalation paths, runbooks, and change management for production environments, equipment, OS, applications, and databases.
Validate and improve health checks, monitoring dashboards (e.g., LogicMonitor), operational KPIs, and performance metrics reports (daily/weekly/monthly).
Oversee shift handovers, ensuring accuracy, clarity, and continuity of operations.
Incident Management
Serve as the primary incident coordinator for major incidents (P1/P2), oversee response efforts across shifts, perform triage, prioritization, mitigation, and collaborate/escalate with Support groups, Service Owners, Vendors, and Third Party Providers.
Ensure correct triage, prioritization, mitigation actions by the team, using Incident and Problem Management tools.
Coordinate escalation to Tier 2/3, Infrastructure, Security, and relevant stakeholders.
Lead post-incident reviews, ensuring documentation, root cause analysis, follow-up action items, and optimization of application performance and batch streams.
Service Quality & Continuous Improvement
Monitor team performance, SLAs, KPIs, and production metrics; ensure targets are met or exceeded through proactive work with teams to optimize application/batch performance.
Identify recurring issues, monitoring gaps, operational inefficiencies, and drive improvement initiatives, including updates to NOC processes, SOPs, runbooks, and documentation.
Collaborate with cross-functional teams (Infrastructure, Networking, Security, DevOps, Applications, Database, Systems) to enhance system reliability, monitoring coverage, manage production changes, and improve job streams.
Proactively recommend improvements to monitoring, alerting, automation, NOC workflows, and application/web server technologies.
Communication & Reporting
Provide clear and consistent communication to management regarding incidents, trends, risks, operational status, using excellent oral/written English skills.
Deliver daily/weekly/monthly operational reports, including incident summaries, performance metrics, and team insights.
Represent the NOC function in internal meetings, service reviews, and cross-team coordination sessions, addressing conflicts constructively.

Qualifications

Bachelor's Degree in Computer Science, Information Systems, IT, Electrical Engineering, or related field; Master's preferred; or equivalent work experience.
Certifications: ITIL (v3/4), CCNA, CISSP, PMP, or Agile.
Proven experience leading or managing technical teams in a NOC, Operations, or Monitoring environment (minimum 3+ years leading teams in 24x7 SaaS/web production settings).
10+ years extensive NOC experience with various systems
Strong troubleshooting expertise across network, system, cloud, application stacks, database management, batch jobs, and production schedules.
Experience with Linux system administration (logs, services, resource usage, shell scripting/command line) and Windows Server fundamentals.
Familiarity with cloud platforms (AWS, Azure, GCP) and cloud monitoring concepts.
Hands-on experience with monitoring/alerting platforms (Icinga, Prometheus, Grafana, PagerDuty, LogicMonitor, or equivalent) and application/web servers (Apache Tomcat, IIS).
Ability to interpret logs, alerts, metrics, telemetry data, and guide team troubleshooting.
Experience with ticketing/incident/problem management tools (Jira, ServiceNow).
Excellent communication skills (high proficiency in English, written/verbal), high situational awareness, calm decision-making under pressure, time management, organizational skills, and ability to handle multiple tasks with minimal supervision.
Ability to work flexible schedules across shifts.

Good To Have

Knowledge of key network protocols (TCP/IP, UDP, DNS, HTTP/S, SSH, BGP fundamentals, FTP) and utilities (Telnet, CURL).
Understanding of VPNs, firewalls, load balancers, proxies, and general IT infrastructure.