Search by job, company or skills

Harbinger Systems Private Limited

Sr. Tech Lead - Site Reliability

8-13 Years
15 - 30 LPA
new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago
  • Be among the first 40 applicants
Early Applicant
Quick Apply

Job Description

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join our team. This role involves ensuring the reliability, scalability, and efficiency of cloud infrastructure and applications while implementing SRE best practices for deployment, monitoring, and automation. As a senior member, you will lead efforts in system reliability, mentor junior engineers, and drive improvements in infrastructure automation.

Key Responsibilities:

Design, build, and maintain scalable and reliable cloud infrastructure.

Ensure System Reliability: Maintain uptime, scalability, and performance across production environments.

Monitor & Alerting Setup: Configure real-time monitoring and observability dashboards.

Automate Everything: Reduce toil by scripting repetitive tasks, CI/CD, and self-healing mechanisms.

Incident Response & RCA: Own on-call rotations, resolve P1/P2 incidents, and create blameless postmortems.

Optimize Costs & Performance: Work on cloud cost optimization (FinOps), database tuning, and caching strategies.

Security & Compliance: Implement least privilege access, encryption, and vulnerability assessments.

Infrastructure as Code (IaC): Deploy and manage infra with Terraform, Ansible, Helm.

Capacity Planning & Scaling: Ensure load balancing, horizontal scaling, and traffic routing.

Process Documentation: Maintain detailed SOPs, incident response guides, and architecture diagrams.

Lead the implementation of CI/CD pipelines for application deployments.

Manage and optimize Kubernetes clusters and containerized workloads.

Collaborate with development and operations teams to ensure smooth deployment of applications.

Troubleshoot and resolve incidents, ensuring minimal downtime for production services.

Mentor and provide guidance to junior engineers, fostering a culture of reliability and automation.

Required Skills & Qualifications:

7+ years of experience in Site Reliability Engineering (SRE), DevOps, or cloud infrastructure roles.

Hands-on experience with cloud platforms (Azure).

Strong experience with CI/CD tools (GitHub Actions, Jenkins, or Azure Pipelines).

Proficiency in Python, Bash, or PowerShell for automation.

Extensive experience with Infrastructure as Code (Terraform).

Expertise in monitoring tools such as Datadog.

Strong understanding of networking, security, and containerization (Docker, Kubernetes).

Proven track record in leading and mentoring teams.

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Indian

About Company

Harbinger is a global technology company that builds products and solutions that transform the way people work and learn. For more than three decades, we have been innovating alongside organizations that are in the people business—serving the Human Resources, eLearning, Digital Publishing, Education, and High-Tech sectors.

At Harbinger, we understand that building a great product requires in-depth knowledge of the user, the nuances of the business, and expertise in technology. That is why we provide both end-to-end Product Development and Content Creation services.

Our pedigree in eLearning and building next-generation products has fostered a culture of continuous learning. We experiment with new technologies such as Generative AI, easily embrace new ideas, and creatively apply them to our customers’ products.

Job ID: 105708707