We are seeking a highly skilled and experienced AWS DevOps/Cloud Infrastructure Engineer to manage, optimize, and secure our mission-critical cloud environment. The ideal candidate will have deep expertise across networking, advanced security, containerized compute (ECS Fargate), data management, and compliance, with a proven track record of implementing and maintaining robust DR/BCP programs.
The Core Responsibilities For The Job Include The Following
Security and Compliance Hardening:
- Design, implement, and operate high-assurance cryptographic key management systems using AWS CloudHSM to meet stringent regulatory and security compliance requirements.
- Enforce a strong security posture using IAM with MFA, AWS Secrets Manager, and comprehensive encryption strategies.
- Implement and manage EDR (Endpoint Detection and Response) solutions across all relevant compute instances (UAT EC2 management hosts).
- Configure and manage web application protection using AWS WAF and Shield in conjunction with the ALB.
- Maintain threat detection using Amazon GuardDuty, configuration management withAWS Config, and logging via CloudTrail.
Disaster Recovery (DR) And Business Continuity
- Conduct DR Drills: Plan, execute, and document quarterly DR failover simulations for critical components, especially the cross-region SQL Server replica.
- Validate the end-to-end recovery processes, measure Recovery Time Objective (RTO) and Recovery Point Objective (RPO) metrics, and perform performance tuning for DR environments to ensure rapid and compliant recovery.
- Participate in and drive regular VAPT (Vulnerability Assessment and Penetration Testing) and BCP/DR drills.
Compute, Containerization, And Performance
- Manage and optimize containerized application deployment using AWS ECS Fargate and maintain the image repository in AWS ECR.
- Utilize AWS Inspector for continuous vulnerability scanning of container images.
- Support for Load Testing: Provide critical assistance for planned load tests, including configuring IP whitelisting in security controls (like WAF and Security Groups) to prevent security blocks and spam triggers.
- Perform real-time monitoring during load tests and generate comprehensive post-test performance reports.
Advanced Monitoring And Observability
- Implement and maintain a comprehensive observability stack utilizing CloudWatch, Prometheus, and Grafana.
- Set up and maintain Synthetic Monitoring(proactive transactional testing) and Real User Monitoring (RUM)to track end-user performance and experience for the Corporate Website.
- Support distributed tracing efforts using Zipkin for application performance and dependency mapping.
Cloud Infrastructure And Data Management
- Manage core networking, including VPC, Network Firewall, Security Groups, Transit Gateway, and Direct Connect.
- Manage and ensure the high availability of the SQL Server Corporate Website Database, including the cross-region replica.
- Manage scalable file storage using Amazon S3 and implement archival policies using AWS Glacier.
- Administer the isolated EC2 instances for UAT (User Acceptance Testing)environments.
Requirements
- 5+ years of experience in IT infrastructure, with at least 3 years focused on AWS cloud engineering or DevOps roles.
- Expertise in AWS Networking(VPC, Transit Gateway, Direct Connect) and Advanced AWS Security(WAF, Shield, GuardDuty).
- Demonstrable experience managing and securing containerized workloads (ECS Fargate, ECR, Inspector).
- Proven ability to plan, execute, and report on Disaster Recovery drills and failover simulations.
- Hands-on experience with advanced monitoring platforms: Prometheus, Grafana, Synthetic, and RUM Monitoring.
Preferred Qualifications
- Direct experience with AWS CloudHSM operations and administration.
- Experience implementing and managing EDR solutions in a cloud environment.
- AWS Certified DevOps Engineer - Professional or AWS Certified Security - Specialty.
- Proficiency with Infrastructure as Code (IaC) tools (Terraform/CloudFormation).
This job was posted by Dileep Teja from WebileApps.