Site Reliability Engineer

resource algorithm

Bengaluru, India

Fresher

Save

Posted 6 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

We are seeking an experienced and dynamic Site Reliability Engineering (SRE) Lead to oversee the reliability, scalability, and performance of our critical systems. As an SRE Lead, you will play a pivotal role in establishing and implementing SRE practices, leading a team of engineers, and driving automation, monitoring, and incident response strategies. This position combines software engineering and systems engineering expertise to build and maintain high-performing, reliable systems.

Job Description

Drive AI-Powered Reliability: Leverage AI and LLM-based tools to automate alert analysis, incident triage, and root-cause detection for faster, smarter recovery.
Performance Optimization at Scale: Tune JVMs, optimize distributed systems, and analyse end-to-end performance across microservices, APIs, and databases.
Observability Evolution: Shape the next-generation observability stack and AI-driven anomaly detection.
Automation First Mindset: Develop automation that reduces manual effort, improves efficiency, and ensures faster, consistent incident resolutions.
Cloud-Native Optimization:Design and enhance scalable workloads running on Kubernetes and modern cloud infrastructure.
End-to-End Ownership: Influence architecture decisions across engineering teams, ensuring reliability and performance are built into every release.
Continuous Improvement Culture: Lead initiatives to identify inefficiencies, optimize performance metrics, and drive data-informed decision making.
Innovation in SRE: Be part of an engineering culture that encourages experimentation, automation, and AI integration to redefine traditional SRE practices.

Reliability & Performance:

Lead efforts to maintain high availability and reliability of critical services.
Define and monitor SLIs, SLOs, and SLAs to ensure business requirements are met.
Proactively identify and resolve performance bottlenecks and system inefficiencies.

Incident Management & Response:

Establish and improve incident management processes and on-call rotations.
Lead incident response and root cause analysis for high-priority outages.
Drive post-incident reviews and ensure actionable insights are implemented.

Automation & Tooling:

Develop and implement automated solutions to reduce manual operational tasks.
Enhance system observability through metrics, logging, and distributed tracing tools (e.g., Prometheus, Grafana, Elastic APM).
Optimize CI/CD pipelines for seamless deployments.

Collaboration:

Partner with software engineering teams to improve the reliability of applications and infrastructure.
Work closely with product/ engineering teams to design scalable and robust systems.
Ensure seamless integration of monitoring and alerting systems across teams.

Leadership & Team Building:

Manage, mentor, and grow a team of SREs.
Promote SRE best practices and foster a culture of reliability and performance across the organization.

Drive performance reviews, skills development, and career progression for team members

More Info

Job Type:

Permanent Job

Industry:

Other

Function:

Site Reliability Engineering

Employment Type:

Full time

About Company

resource algorithmJob Source: www.linkedin.com

Job ID: 147470907

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Last Updated: 15-05-2026 07:59:36 PM

Homejobs in Bengaluru / BangaloreSite Reliability Engineer

Similar Jobs

Staff Site Reliability Engineer

Candescent SoftBase

Bengaluru, India

Skills:

CI CD - GitHub, S3, Orchestration, RDS, Aws Services, Cloud networking, Configuration management, Prometheus, Cloudwatch, Lambda, Efs, Terraform, Dynatrace, Kubernetes, Scripting – Python, Cloud migrations, Version control – GIT, Application and Infrastructure Delivery automation, EKS, EBS, Storage Solutions

Site Reliability Engineer I

Earnin

1-3 yrs

Bengaluru, India

Skills:

Cloudwatch, Docker, Datadog, Kubernetes, Python, AWS, AI productivity tools, Go, incident.io

Lead Technology Specialist(Lead Site Reliability Engineer)

caterpillar inc.

Bengaluru, India

Skills:

Terraform, Incident Response, Ansible, Helm, Kubernetes, AWS, Linux systems administration

Site Reliability Engineer

clearroute

Bengaluru, India

Skills:

training material , Azure, Blameless Post Mortems, Infrastructure as Code, Root Cause Analysis, Run Books, Java Applications

Senior Site Reliability Engineer - AVP

Deutsche Bank

Bengaluru, India

Skills:

Windows server, Saas, Openshift, Kdb, Grafana, Mssql, Itrs, New Relic, Geneos, Gcp, Terraform, Ansible, Netcool, Distributed Systems, Oracle, Kubernetes, Error budgets, Unix servers, Incident governance, OpenTelemetry, Telemetry pipelines, SLOs, Observability tools

Do you want to see more relevant and perfect job for you?

Beware of Scammers

We don’t charge any money for job offers

What it feels like to have

48% more interview calls?

To get 5X more recruiter views on your profile

Real-time notifications

Discover new jobs, get recruiter notifications, track applications & more with the foundit App.

Scan to download foundit App