Search by job, company or skills

  • Posted a month ago
  • Be among the first 20 applicants
Early Applicant

Job Description

Why Join Us

The NOC Team Leader will oversee 24/7 infrastructure, application services, and production systems, driving high availability, alert response, batch job monitoring, and cross-team collaboration for uninterrupted SaaS/web services. This senior role demands strategic leadership in a fast-paced, multi-shift environment with emphasis on troubleshooting, metrics reporting, and process optimization.

Key Responsibilities

Major Responsibilities

  • Team Leadership & Management
  • Lead, mentor, and support a team of NOC engineers across all shifts, guiding them in monitoring production systems, applications, batch jobs, and diagnostic tools.
  • Set priorities, distribute tasks, ensure proper workload balance, and track issues through first-level analysis to closure with IT teams.
  • Drive professional development through training, coaching, ongoing feedback, and contributions to knowledge base articles, process documents, and playbooks.
  • Conduct periodic 1:1 meetings, performance evaluations, and goal-setting.
  • Recruit, onboard, and integrate new NOC engineers into the team.
  • Build and maintain a culture of accountability, high performance, service quality, and proactive collaboration with Applications, Systems, Database, and Network teams.
  • Operational Oversight
  • Own the day-to-day operations of the entire NOC function, ensuring consistent monitoring, alert handling, batch job troubleshooting, operational routine execution, and impact assessment on production schedules.
  • Ensure all teams consistently follow predefined procedures, escalation paths, runbooks, and change management for production environments, equipment, OS, applications, and databases.
  • Validate and improve health checks, monitoring dashboards (e.g., LogicMonitor), operational KPIs, and performance metrics reports (daily/weekly/monthly).
  • Oversee shift handovers, ensuring accuracy, clarity, and continuity of operations.
  • Incident Management
  • Serve as the primary incident coordinator for major incidents (P1/P2), oversee response efforts across shifts, perform triage, prioritization, mitigation, and collaborate/escalate with Support groups, Service Owners, Vendors, and Third Party Providers.
  • Ensure correct triage, prioritization, mitigation actions by the team, using Incident and Problem Management tools.
  • Coordinate escalation to Tier 2/3, Infrastructure, Security, and relevant stakeholders.
  • Lead post-incident reviews, ensuring documentation, root cause analysis, follow-up action items, and optimization of application performance and batch streams.
  • Service Quality & Continuous Improvement
  • Monitor team performance, SLAs, KPIs, and production metrics; ensure targets are met or exceeded through proactive work with teams to optimize application/batch performance.
  • Identify recurring issues, monitoring gaps, operational inefficiencies, and drive improvement initiatives, including updates to NOC processes, SOPs, runbooks, and documentation.
  • Collaborate with cross-functional teams (Infrastructure, Networking, Security, DevOps, Applications, Database, Systems) to enhance system reliability, monitoring coverage, manage production changes, and improve job streams.
  • Proactively recommend improvements to monitoring, alerting, automation, NOC workflows, and application/web server technologies.
  • Communication & Reporting
  • Provide clear and consistent communication to management regarding incidents, trends, risks, operational status, using excellent oral/written English skills.
  • Deliver daily/weekly/monthly operational reports, including incident summaries, performance metrics, and team insights.
  • Represent the NOC function in internal meetings, service reviews, and cross-team coordination sessions, addressing conflicts constructively.

Qualifications

  • Bachelor's Degree in Computer Science, Information Systems, IT, Electrical Engineering, or related field; Master's preferred; or equivalent work experience.
  • Certifications: ITIL (v3/4), CCNA, CISSP, PMP, or Agile.
  • Proven experience leading or managing technical teams in a NOC, Operations, or Monitoring environment (minimum 3+ years leading teams in 24x7 SaaS/web production settings).
  • 10+ years extensive NOC experience with various systems
  • Strong troubleshooting expertise across network, system, cloud, application stacks, database management, batch jobs, and production schedules.
  • Experience with Linux system administration (logs, services, resource usage, shell scripting/command line) and Windows Server fundamentals.
  • Familiarity with cloud platforms (AWS, Azure, GCP) and cloud monitoring concepts.
  • Hands-on experience with monitoring/alerting platforms (Icinga, Prometheus, Grafana, PagerDuty, LogicMonitor, or equivalent) and application/web servers (Apache Tomcat, IIS).
  • Ability to interpret logs, alerts, metrics, telemetry data, and guide team troubleshooting.
  • Experience with ticketing/incident/problem management tools (Jira, ServiceNow).
  • Excellent communication skills (high proficiency in English, written/verbal), high situational awareness, calm decision-making under pressure, time management, organizational skills, and ability to handle multiple tasks with minimal supervision.
  • Ability to work flexible schedules across shifts.

Good To Have

  • Knowledge of key network protocols (TCP/IP, UDP, DNS, HTTP/S, SSH, BGP fundamentals, FTP) and utilities (Telnet, CURL).
  • Understanding of VPNs, firewalls, load balancers, proxies, and general IT infrastructure.

More Info

Job Type:
Industry:
Function:
Employment Type:

Job ID: 142111657