Information Technology Operations Lead

bima sugam india federation

Mumbai, India

12-14 Years

Save

Posted 2 days ago
Be among the first 10 applicants

Early Applicant

Job Description

The Technology Operations Lead owns and drives Production stability, observability and operational governance of digital platforms/application ensuring seamless functioning of platform in production environments.

Title :

Technology Operations Lead

Position Objectives

The role will act as the single point of ownership for production operations, responsible for incident & problem management, change release governance, observability effectiveness, while ensuring alignment with business SLAs, regulatory requirements and enterprise standards.

Indicative Responsibilities

1. Application Production Support

• Own availability, reliability & performance of production business application/digital platform

• Own operational acceptance of applications before they go live; Ensure readiness across: Support model (L1/L2/L3), Documentation and runbooks, Capacity and performance baselines, DR and backup readiness

• Sign-off from Application owner, that applications are fit for production support

• Ensure adherence to SLA, uptime and performance benchmarks

• Maintain end-to-end visibility across application and infrastructure layers

• Govern capacity planning especially for peak loads and business events

• Participate DR drills and failover testing

• Support vulnerability remediation prioritization

2. Incident Management (Command & Control)

• Act as Incident Commander for P1 incidents - Drive war rooms, triage and cross-team coordination (App & Infra)

• Ensure rapid restoration of services and structured internal stakeholder communication across teams • Track and reduce incident frequency and impact

• Ensure incidents are logged, tracked, categorized, and closed as per ITSM processes

3. Problem Management & RCA Governance

• Validate the quality, depth, and accuracy of RCAs provided by internal teams and vendors/partners.

• Ensure permanent fixes and prevention of recurring issues

• Maintain and track problem backlog and corrective actions

4. Change & Release Governance

• Participate in change and release governance from a production stability perspective

• Review production readiness for releases, including Rollback and recovery plans, monitoring and alerting readiness, support runbooks and escalation models

• Approve/reject changes based on change process completeness

• Ensure controlled and stable release cycle

5. Observability & Monitoring Governance

• Govern (Application Performance Monitoring) APM & metrics - Maintain visibility across application and infrastructure dependencies

• Contribute to enhancing infrastructure monitoring frameworks.

• Improve alert quality, reduce noise and ensure actionable monitoring

• Enable proactive detection of issues

6. Vendor Management & Governance

• Manage vendor partners for production operations

• Ensure adherence to SLA, response timelines and quality standards

• Prevent blame shifting and enforce clear ownership & accountability

• Drive performance reviews and escalation management

• Seek monthly and quarterly operations health reports

• Own and validate production operations dashboards shared by partner/vendor covering availability, incidents, business journeys, change stability and observability effectiveness

8. Continuous Improvement & Operational Excellence

• Identify patterns in incidents and performance issues

• Drive process improvements and operational maturity

• Improve MTTD, MTTR and overall system reliability

Reports To Head Infrastructure

Coverage / Sub functions

• Technology Operations – Production Application Stability & Performance

• Incident & Problem Management

• Change Release Management

• Observability & Monitoring Governance

• Operational Readiness & Business Continuity

Key Skills & Competencies

A) Technical Skills

• Hands-on understanding of AWS cloud services including Kubernetes, containerized application platforms and distributed systems distributed systems concepts (timeouts, retries, partial failures, and cascading impact)

• Operational understanding of Storage & Database services (including RDS, Aurora, Document DB, etc)

• Strong understanding of application architectures, APIs, and microservices-based platforms

• Ability to trace end-to-end request flows across multiple services

• Ability to correlate logs, metrics, and traces to diagnose production issues

• Knowledge of observability tools (APM, ELK/OpenSearch, Prometheus, Grafana, Jaeger)

• Experience in incident, problem and change management (ITIL practices)

• Understanding of infrastructure and system dependencies

• Ability to analyze and troubleshoot cloud-specific failure patterns such as throttling, saturation, connectivity issues, and regional dependencies

B) Strategic Thinking and Problem-Solving

• Ability to analyze infrastructure challenges and propose reliable and scalable solutions.

• Ability to drive end-to-end issue resolution across multiple domains

• Strong analytical approach to incident trends and system behavior

• Capability to balance risk, stability and speed of delivery

• Decision-making in high-pressure production situations

• Continuously improve monitoring, alerting and operational processes

C) Communication and Interpersonal Skills

• Strong ability to manage cross-functional teams and vendors

• Effective communication with business, leadership and technical stakeholders

• Ability to handle critical incident communication calmly and clearly

D) Governance and Compliance

• Proficiency in establishing IT governance frameworks and ensuring compliance.

• Ability to generate and present detailed reports for regulatory bodies

Qualifications Education and Experience

• Bachelor's or master's degree in computer science, Information Technology, Engineering, or equivalent.

• 12+ years of experience in Application Production Support & Technology Operations leadership with strong exposure to:

o AWS cloud services, Kubernetes, Database services & API architecture understanding o Observability stack (APM, ELK, Prometheus, Grafana, Jaeger)

o Incident, Problem & Change management

o Production stability & release governance

o Improving MTTR, MTTD and Operational maturity

o Strong experience in digital platforms, cloud-native architectures and regulatory environments.

• Relevant certifications preferred:

o Cloud: AWS or Azure. o ITIL/ITSM frameworks o Observability & DevOps

Location Mumbai - Powai (work from office)

More Info

Job Type:

Permanent Job

Industry:

Other

Function:

Technology Operations

Employment Type:

Full time

About Company

bima sugam india federationJob Source: www.linkedin.com

Job ID: 148661391

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Last Updated: 03-06-2026 05:35:44 PM

Homejobs in MumbaiInformation Technology Operations Lead

Do you want to see more relevant and perfect job for you?

Beware of Scammers

We don’t charge any money for job offers

What it feels like to have

48% more interview calls?

To get 5X more recruiter views on your profile

Real-time notifications

Discover new jobs, get recruiter notifications, track applications & more with the foundit App.

Scan to download foundit App