Role: Site Reliability Engineering Lead
Location: Chennai (Hybrid)
Experience Required: 7+ years of experience in SRE DevOps, or Cloud Infrastructure with minimum 2+ years in a lead/mentoring
Roles and Responsibilities:
- Deep AWS expertise (EC2, S3, RDS, IAM, VPC, Lambda, CloudFormation/Terraform, etc.). Strong knowledge of Infrastructure-as-Code (IaC) using Terraform, AWS CDK, or CloudFormation.
- Proven experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, or similar).
- Proficiency in containerization and orchestration (Docker, Kubernetes, ECS, or EKS).
- Expertise in monitoring and observability tools (Datadog, New Relic, Prometheus, Grafana, ELK, CloudWatch, etc.)
- Strong scripting or programming background (Python, Bash, or Go).
- Sound understanding of networking, security, and identity/access management in the cloud. Experience designing high-availability and disaster recovery strategies for critical workloads. Excellent communication, problem-solving, and leadership skills with the ability to influence across teams.
- Desired Skills AWS or other Cloud Certification (Solutions Architect, DevOps Engineer, etc.).
- Experience with AIOps, Serverless Architectures, and event-driven systems.
- Familiarity with FinOps practices and cost optimization frameworks.
- Experience with SaaS monitoring tools (Datadog, New Relic, Sumo Logic, PagerDuty).
- Exposure to Atlassian tools (Jira, Confluence, Bitbucket).
- Experience with SQL/NoSQL databases.
- Proven track record of leading cross-functional reliability initiatives or platform-wide automation projects.