Search by job, company or skills

Z

Associate Manager - Reliability Operations

new job description bg glownew job description bg glownew job description bg svg
  • Posted 11 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About Zeta

Zeta is aNext-Gen Banking Techcompany that empowers banks and fintechs to launch banking products for the future. It was founded byand Ramki Gaddipati in 2015.
Our flagship processing platform - Zeta Tachyon - is the industry's first modern, cloud-native, and fully API-enabled stack that brings together issuance, processing, lending, core banking, fraud & risk, and many more capabilities as a single-vendor stack. 20M+ cards have been issued on our platform globally.
Zeta is actively working with the largest Banks and Fintechs in multiple global markets transforming customer experience for multi-million card portfolios.
Zeta has over1700+employees - with over70%roles in R&D - across locations in theUS,EMEA, andAsia. We raised$280 millionat a$1.5 billionvaluation from Softbank, Mastercard, and other investors in 2021.
Learn more @,,,


Role


  • The Associate Manager - Reliability Operations leads a team to rigorously uphold service level objectives (SLOs) through expert alert management, SOP-compliant ticket escalations, and coordinated support for SRE-signed deployments across multiple sites.
  • This role drives operational accountability, fosters seamless SRE partnerships, and ensures production stability in a high stakes 24x7 SaaS environment

  • Responsibilities


  • Drives SLO adherence by implementing advanced metric monitoring, enforcing error budgets, and spearheading proactive initiatives to prevent breaches and elevate system reliability.
  • Ensures all alerts receive immediate acknowledgment, with tickets escalated to SRE teams for any issues lacking defined SOPs, systematically reducing escalations, downtime, and MTTR.
  • Coordinates standard deployments across sites following SRE sign-off, overseeing logistics, real-time rollout health monitoring, and rigorous post-deployment SLO validation.
  • Collaborates strategically with SRE teams on deployment planning, comprehensive risk assessments, troubleshooting, and post-release optimizations for flawless execution and rapid recovery.
  • Oversees and refines team processes for alert triage, SOP documentation/updates, and knowledge sharing, integrating automation to minimize manual toil and enhance operational resilience.
  • Mentors staff on SLO-driven decision-making, conducts in-depth audits of alert/ticket workflows, analyses trends in operational data, and delivers actionable reliability KPI reports to stakeholders.

  • Skills


  • Proven track record in 24x7 SaaS/cloud support operations, handling high-pressure incidents and customer-impacting events.
  • Strong proficiency in monitoring/incident tools (Prometheus, Grafana, Splunk, PagerDuty) and ticketing systems.
  • Effective leadership and people management, with excellent communication for technical/non-technical collaboration.
  • Analytical skills to interpret operational data, identify trends, and drive process recommendations.

  • Experience and Qualifications


  • Familiarity with ITIL frameworks, SRE principles (e.g., error budgets, toil reduction), and cloud platforms (AWS, Azure, GCP).
  • Experience with process improvement methodologies and shift handoff protocols.
  • Knowledge of basic reliability concepts and observability stacks.
  • Education: Bachelor's degree in Information Technology, Business, or related field relevant IT certifications (e.g., ITIL Foundation) are a plus.
  • Experience: 6-8 years in operations support, reliability operations, or IT service management, including 2+ years in supervisory roles managing 24x7 teams.

  • Shift Information


  • 24x7 Operational Oversight: Role with on-call and shift responsibilities for escalations provides oversight for 24x7 team operations, including shift scheduling and off-hour incident coordination.


  • More Info

    About Company

    Zeta is the world's first Omni Stack for credit cards. A single stack for Origination, Processing, FRM, Rewards, Loans, APIs, and Apps.

    Job ID: 139120471