Search by job, company or skills

IntraEdge

Resiliency and Continuity Specialist

Save
  • Posted 8 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Title: Resiliency and Continuity Specialist

Experience: 5+ Years

Location: Hyderabad

Employment Type: Full-Time

About the Role

We are seeking a highly motivated Resiliency and Continuity Specialist to support enterprise technology resilience initiatives and ensure cloud-hosted applications and platforms maintain high levels of availability, recoverability, and operational readiness.

This role serves as a subject matter expert in technology resilience, disaster recovery, and operational continuity. The ideal candidate will work closely with engineering, SRE, cloud infrastructure, application, and governance teams to coordinate resilience testing, validate recovery capabilities, review recovery plans, and ensure compliance with organizational resiliency standards.

The successful candidate will play a critical role in strengthening enterprise recovery capabilities through resilience exercises, chaos testing, audit-ready documentation, and continuous improvement initiatives.

Key Responsibilities

Cloud Resilience Testing & Recovery Validation

  • Coordinate, plan, and support execution of cloud resilience and disaster recovery exercises across enterprise applications and platforms.
  • Conduct both in-region and cross-region resilience testing to validate system recovery capabilities.
  • Ensure recovery testing aligns with defined Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs).
  • Collaborate with engineering and operations teams to develop meaningful resilience and chaos engineering scenarios.
  • Validate application recoverability, service continuity, and infrastructure resilience through structured testing exercises.

System Recovery Plan (SRP) Governance

  • Review and validate System Recovery Plans (SRPs) to ensure completeness, accuracy, and operational readiness.
  • Verify recovery procedures, dependencies, sequencing, ownership, and execution timelines.
  • Ensure adherence to organizational resiliency frameworks, templates, and standards.
  • Identify gaps, risks, and improvement opportunities within recovery documentation.
  • Drive remediation efforts to improve recovery preparedness.

Resilience Exercise Management

  • Coordinate all phases of resilience exercises, including:
  • Planning
  • Scheduling
  • Stakeholder communication
  • Execution oversight
  • Post-exercise reviews
  • Ensure pre-test documentation includes:
  • Scope
  • Success criteria
  • Recovery steps
  • Roles and responsibilities
  • Dependency mapping
  • Track and document exercise outcomes, deviations, and lessons learned.
  • Facilitate post-mortem reviews and continuous improvement activities.

Evidence Validation & Audit Readiness

  • Review resilience testing evidence packages for completeness and compliance.
  • Ensure evidence:
  • Is properly timestamped and serialized
  • Maps to documented recovery steps
  • Demonstrates successful recovery outcomes
  • Supports audit and regulatory requirements
  • Validate recovery metrics, execution results, and test outcomes.
  • Maintain audit-ready documentation and support compliance reviews.

Operational Resilience & Governance

  • Support enterprise resilience programs and governance initiatives.
  • Assist teams in applying resilience frameworks, assessment methodologies, and operational standards.
  • Ensure resilience activities align with internal policies and regulatory expectations.
  • Participate in risk assessments, control reviews, and resilience audits.

Cross-Functional Collaboration

  • Partner with:
  • Engineering Teams
  • SRE Teams
  • Cloud Operations
  • Infrastructure Teams
  • Architecture Teams
  • Risk & Compliance Teams
  • Coordinate remediation activities and track closure of identified gaps.
  • Provide guidance on resilience best practices for cloud deployments and system changes.

Continuous Improvement

  • Contribute to enhancement of:
  • Recovery standards
  • Resilience frameworks
  • Testing methodologies
  • Reporting processes
  • Governance controls
  • Support resilience maturity initiatives across the organization.

Required Qualifications

Experience

  • 5+ years of experience in:
  • Technology Resilience
  • Disaster Recovery
  • Operational Resilience
  • Site Reliability Engineering (SRE)
  • Technology Risk Management
  • IT Governance
  • Infrastructure Operations

Cloud & Infrastructure Knowledge

  • Strong understanding of cloud architecture and resilience principles.
  • Experience working with cloud platforms such as:
  • AWS
  • Azure
  • GCP
  • Understanding of:
  • Regions and Availability Zones
  • Load Balancing
  • Auto Scaling
  • Infrastructure as Code (IaC)
  • Backup & Restore Strategies
  • Replication Mechanisms
  • Service Dependencies
  • High Availability Architectures

Resilience & Recovery Expertise

  • Experience coordinating and executing:
  • Disaster Recovery (DR) Tests
  • Business Continuity Exercises
  • Resilience Testing
  • Chaos Engineering Simulations
  • Experience creating and maintaining recovery plans and operational documentation.
  • Strong understanding of RTO, RPO, and service recovery frameworks.

Monitoring & Observability

  • Familiarity with:
  • Monitoring Platforms
  • Observability Solutions
  • Alerting Systems
  • Incident Management Processes
  • Understanding of Chaos Engineering and resilience validation practices.

Tools & Technologies

Experience with:

  • ServiceNow
  • GRC Platforms
  • Harness
  • Microsoft Office Suite
  • Excel
  • Word
  • PowerPoint
  • Visio
  • MS Project

Preferred Qualifications

Reporting & Analytics

  • Experience with resilience metrics, reporting, and dashboarding.
  • Knowledge of:
  • Power BI
  • Tableau
  • Advanced Excel
  • Crystal Reports

Cloud Certifications

Preferred certifications include:

  • AWS Certified Cloud Practitioner
  • Microsoft Azure Fundamentals
  • Google Cloud Digital Leader
  • Cloud Architecture Certifications (Preferred)

Business Continuity & Disaster Recovery Certifications

  • CBCP (Certified Business Continuity Professional)
  • Disaster Recovery Institute (DRI) Certifications
  • Business Continuity Certifications

Project Management

  • PMP
  • PRINCE2
  • Agile Certifications

Soft Skills

  • Excellent stakeholder management and communication skills.
  • Strong analytical and problem-solving abilities.
  • Ability to coordinate multiple teams across complex environments.
  • Strong documentation and governance mindset.
  • Detail-oriented with a focus on audit readiness and operational excellence.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 149630029