Overview
- Should have Proven experience & troubleshooting resources in large-scale AWS cloud operations across multiple accounts and environments.
- Strong understanding of Site Reliability Engineering (SRE) principles, operational KPIs, and reliability engineering practices.
- Expertise in AWS governance frameworks, FinOps practices, cost optimization, and risk management.
- Oversee the AWS infrastructure lifecycle, including provisioning, monitoring, performance tuning, and optimization.
- Ensure compliance with security, governance, and regulatory standards across AWS accounts and services.
Responsibilities
- Deep knowledge of AWS services including EC2, VPC, S3, RDS, Lambda, EKS/ECS, IAM, CloudFront, Route 53, and AWS Security services.
- Strong experience with Infrastructure as Code tools such as Terraform, AWS CloudFormation, or CDK.
- Proficiency in automation scripting using Python, Bash, or PowerShell.
- Familiarity with CI/CD tools such as AWS CodePipeline, GitHub Actions, Jenkins, or GitLab CI.
- Advanced understanding of monitoring and observability tools such as Amazon CloudWatch, AWS X-Ray, Prometheus, and Grafana.
Qualifications
Should have 6-8 yrs of core AWS provisioning and troubleshooting experience working with large scale environment with Btech degree , and preferably with the below certifications
- AWS Certified Solutions Architect Associate
- AWS Certified DevOps Engineer Associate
- ITIL v4 or equivalent for operational process management.