Role: SRE Lead Engineer
Skills: Docker, Prometheus, grafana, ELK, DataDog
Location: Noida
Experience: 8+ Years
Mode: Work from office
We at Coforge are hiring a highly skilled and experienced SRE Lead Engineer to drive reliability, scalability, and performance across our infrastructure and applications. You will lead a team of SREs, collaborate with development and operations teams, and implement best practices to ensure high availability and resilience of our systems.
Key Responsibilities:
- Lead and mentor a team of SREs to build scalable and reliable systems.
- Design and implement monitoring, alerting, and incident response strategies.
- Drive automation of operational tasks and improve deployment pipelines.
- Collaborate with software engineers to ensure reliability is built into the product from the ground up.
- Conduct root cause analysis and postmortems for production incidents.
- Define and track SLAs, SLOs, and SLIs to measure and improve system reliability.
- Champion DevOps and SRE best practices across the organization.
- Manage capacity planning and performance tuning.
- Ensure security and compliance standards are met in infrastructure operations.
Required Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- 8+ years of experience in software engineering, DevOps, or SRE roles.
- Strong experience with azure platforms.
- Proficiency in programming/scripting languages (Python, Go, Bash, etc.).
- Expertise in CI/CD tools (Jenkins, GitLab CI, etc.).
- Deep understanding of containerization and orchestration (Docker, Kubernetes).
- Experience with observability tools (Prometheus, Grafana, ELK, Datadog).
- Excellent problem-solving and communication skills.
- Proven leadership experience in technical teams.
Preferred Qualifications:
- Certifications in cloud technologies or DevOps practices.
- Experience with Infrastructure as Code (Terraform, Ansible).
- Familiarity with chaos engineering and resilience testing.
- Exposure to regulatory compliance (e.g., SOC2, ISO27001).