Be responsible for production support & release management for application assigned - SRE C1 - Disaster Recovery : DR Activities.
Should ensure high availability, resilience, incident management, DR readiness, and operational excellence across production environments.
Should possess excellent troubleshooting and analytical skills.
Key Responsibilities
Disaster Recovery (DR) Management
Lead end-to-end DR drills, failover, fallback, and BCP activities for critical applications and infrastructure.
Coordinate with application, database, network, cloud, security, and vendor teams during DR exercises.
Prepare DR Runbooks, recovery procedures, RCA reports, and audit documentation.
Ensure DR compliance with regulatory and internal governance requirements.
Validate RPO/RTO adherence and recommend resilience improvements.
Conduct periodic DR readiness reviews and gap assessments.
Ensuring the robust replication between primary & secondary sites.
Oversee daily backup(s).
Reliability Engineering
Improve system reliability, uptime, scalability, and operational efficiency.
Implement SRE best practices including SLI/SLO/SLA management.
Support release management, deployment validation, and change activities.
Collaborate with DevOps, Infra, Security, and Application teams for platform stability.
Participate in on-call support and critical incident handling.
Other Key Responsibilities
Provide regular updates on application health and incidents.
Support reporting to senior managers and business stakeholders.
Coordinate with DevOps, infrastructure, security, DBA & application teams
Participate in release management and deployment activities
Prepare RCA, SOP, KT, and operational documentation
Support audits, compliance, and governance requirements
Track resolution of all open issues, be part of the solutioning team for war room / discussions
Manage audit queries related to Production environment
Mandatory Skills Required
Must have participate/conduct DR Planning of overall 5 years experience with at least 2+ years in Application/site level as RBI guidelines and orchestrated entire DR activities every year for minimum 15 to 20 applications.
Strong understanding of SRE, Production Support, and DR/BCP concepts.
Experience with Linux/Unix and Windows production environments.
Ability to work in rotational shifts/on-call support
Leadership and team coordination skills
Strong analytical and troubleshooting skills.
Experience in BFSI production environments with 24x7 support model.
Good understanding of ITIL processes (Incident, Problem, Change, Release).
Exposure to InfoSec, compliance, and audit requirements.
Excellent stakeholder coordination and communication skills.
Qualifications
Bachelor's or Master's degree in Computer Science, Information Technology, or related field.