We are seeking a highly skilled and motivated Tech Lead – Disaster Recovery Specialist to design, implement, and maintain robust disaster recovery (DR) solutions that ensure the resilience, scalability, and reliability of our IT infrastructure and applications. The ideal candidate will be responsible for managing all aspects of disaster recovery, including planning, testing, automation, and continuous improvement in an AWS environment.
Key Responsibilities
Disaster Recovery & Business Continuity
- Develop, implement, and maintain disaster recovery (DR) plans to ensure quick restoration of critical systems and data in case of a disaster.
- Plan, design, document, and test disaster recovery solutions to meet business and technology requirements.
- Conduct regular DR drills and simulations to assess the effectiveness of the recovery strategy and identify areas for improvement.
- Collaborate with stakeholders to identify mission-critical systems and data that require protection under the DR plan.
- Manage backup solutions, ensuring that data is regularly backed up and can be restored efficiently.
Monitoring, Automation & Infrastructure Management
- Implement and manage monitoring, logging, and alerting solutions to maintain system health and performance.
- Automate disaster recovery processes to improve efficiency and reduce manual intervention.
- Maintain and enhance Infrastructure-as-Code (IaC) practices using tools such as Terraform, Ansible, or AWS CloudFormation.
- Manage containerization and orchestration using platforms like Docker and Kubernetes.
Continuous Improvement & Collaboration
- Drive continuous improvement initiatives to enhance the efficiency and effectiveness of disaster recovery strategies.
- Work closely with cross-functional teams to align DR solutions with broader DevOps and cloud infrastructure strategies.
- Ensure DR solutions adhere to compliance, security, and best practices within cloud-based environments.
Requirements
Qualifications & Experience
- Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience).
- 7+ years of experience in AWS infrastructure, with a focus on disaster recovery and business continuity.
- Strong understanding of DevOps principles, including continuous delivery, deployment, and improvement.
- Hands-on experience with DevOps platform tooling such as Git-based repositories, Jenkins, Docker, and Kubernetes.
- Expertise in Infrastructure-as-Code (IaC) tools such as Terraform, Ansible, or AWS CloudFormation.
- Proven ability to design, implement, and test DR solutions in a cloud environment (AWS preferred).