Search by job, company or skills

Z

Senior Associate - Reliability Operations

new job description bg glownew job description bg glownew job description bg svg
  • Posted 12 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About Zeta

Zeta is a Next-Gen Banking Tech company that empowers banks and fintechs to launch banking products for the future. It was founded by and Ramki Gaddipati in 2015.

Our flagship processing platform - Zeta Tachyon - is the industry's first modern, cloud-native, and fully API-enabled stack that brings together issuance, processing, lending, core banking, fraud & risk, and many more capabilities as a single-vendor stack. 20M+ cards have been issued on our platform globally.

Zeta is actively working with the largest Banks and Fintechs in multiple global markets transforming customer experience for multi-million card portfolios.
Zeta has over 1700+ employees - with over 70% roles in R&D - across locations in the US, EMEA, and Asia. We raised $280 million at a $1.5 billion valuation from Softbank, Mastercard, and other investors in 2021.

Learn more @, , ,

About the Role


  • The Senior Associate Reliability Operations role is critical in ensuring the continuous, reliable, and secure operation of our SaaS products, operating in a 24x7 support capacity. This role involves proactive monitoring, incident response, and collaboration with teams across the organization to maintain optimal service levels. The Senior Associate will participate in a rotating shift schedule to ensure high availability, rapid issue resolution, and support for key reliability initiatives. Senior Associate will serve as a key escalation point, mentor junior team members, and lead critical efforts to optimize operational workflows and systems.

  • Responsibilities:


  • 24x7 Monitoring and Support: Oversee the health, performance, and availability of cloud-based SaaS infrastructure and applications, using monitoring tools like Prometheus and Grafana, and respond to alerts during assigned shifts. Alignment and adherence to organization process to maintain the SLA.
  • Incident Management: Act as the first responder in a 24x7 rotation, managing and mitigating service disruptions, following standard incident procedures, and escalating issues to SMEs as needed.
  • Deployments and Change Management: Manage deployment lifecycle of the applications. Proactively engage with SMEs to resolve deployment process issues or challenges.
  • Troubleshooting and Resolution: Use diagnostic tools and scripts to resolve common issues in real-time and collaborate with cross-functional teams to analyze and address root causes.
  • Service Health and Reliability: Assist in defining and refining SLAs, SLOs, and SLIs perform routine checks and follow established runbooks to maintain consistent service reliability.
  • Analysis and Reporting: Regularly review incident data to identify patterns, improve service resilience, and produce shift reports summarizing system health and resolved incidents.
  • Documentation and Knowledge Base: Document incident resolutions, update runbooks, and contribute to an internal knowledge base to improve team response and efficiency.
  • Continuous Improvement Initiatives: Participate in reliability enhancement projects, including automation, configuration management, and tools improvement.
  • Collaboration: Communicate effectively with SMEs to relay critical incident information, insights, and preventive recommendations
  • Mentorship: Work closely with team members to provide guidance during shifts and share insights on improving incident response.

  • Experience and Qualifications


  • Education: IT, Computers, BCA or equivalent.
  • Experience: 2-4 years of experience in reliability operations or related 24x7 support role within SaaS or cloud environments

  • Skills


  • Proficiency in monitoring and alerting tools, such as Prometheus, Grafana, Datadog, or Splunk.
  • Ability to remain composed in high-stakes situations and resolve incidents promptly.
  • Strong verbal and written communication skills to document and relay incident information effectively.

  • Shift Information


  • 24x7 Rotational Shifts: This role requires availability to work rotating shifts, including nights, weekends, and holidays, to ensure 24x7 support coverage.


  • More Info

    About Company

    Zeta is the world's first Omni Stack for credit cards. A single stack for Origination, Processing, FRM, Rewards, Loans, APIs, and Apps.

    Job ID: 139120479