Site Reliability Engineering Lead

7-9 Years

Save

Early Applicant

Job Description

Role: Site Reliability Engineering Lead

Location: Chennai (Hybrid)

Experience Required: 7+ years of experience in SRE DevOps, or Cloud Infrastructure with minimum 2+ years in a lead/mentoring

Roles and Responsibilities:

Deep AWS expertise (EC2, S3, RDS, IAM, VPC, Lambda, CloudFormation/Terraform, etc.). Strong knowledge of Infrastructure-as-Code (IaC) using Terraform, AWS CDK, or CloudFormation.
Proven experience with CI/CD tools (Jenkins, GitHub Actions, GitLab CI, or similar).
Proficiency in containerization and orchestration (Docker, Kubernetes, ECS, or EKS).
Expertise in monitoring and observability tools (Datadog, New Relic, Prometheus, Grafana, ELK, CloudWatch, etc.)
Strong scripting or programming background (Python, Bash, or Go).
Sound understanding of networking, security, and identity/access management in the cloud. Experience designing high-availability and disaster recovery strategies for critical workloads. Excellent communication, problem-solving, and leadership skills with the ability to influence across teams.
Desired Skills AWS or other Cloud Certification (Solutions Architect, DevOps Engineer, etc.).
Experience with AIOps, Serverless Architectures, and event-driven systems.
Familiarity with FinOps practices and cost optimization frameworks.
Experience with SaaS monitoring tools (Datadog, New Relic, Sumo Logic, PagerDuty).
Exposure to Atlassian tools (Jira, Confluence, Bitbucket).
Experience with SQL/NoSQL databases.
Proven track record of leading cross-functional reliability initiatives or platform-wide automation projects.