Senior Site Reliability Engineer

Lenovo

Bengaluru, India

8-10 Years

Save

Posted 5 days ago
Be among the first 10 applicants

Early Applicant

Job Description

We are Lenovo. We do what we say. We own what we do. We WOW our customers.

Lenovo is a US$69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world's largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo's continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).

This transformation together with Lenovo's world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub.

Senior Site Reliability Engineer (SRE)

About Our Team

Lenovo is buildingQuantum, a nextgeneration hybrid AI platform that spans Windows, Android, and cloud. As part of this vision, we are expanding the reliability engineering organization that powersQira, Lenovo's crossdevice Personal AI.

We are looking forSenior Site Reliability Engineers (SREs)to help us build and evolve the foundational reliability, observability, and operations capabilities that ensure Qira is fast, safe, and dependable for millions of users.

This role may support one of several teams within the SRE organization (e.g., Observability, Operations, or Service Reliability), depending on your strengths and interests.

Qira isoperatingwith thespeed, ownership, and creative latitude of a startupyet supported by the scale, resources, and technical depth of Lenovo. We are building new systems, new tooling, and new operational models from the ground up, and we are doing so with clarity, intention, and high engineering standards.

What You Might Work On

As a Senior SRE, you maybe responsible fora subset of the following, depending on team placement and skill alignment:

Reliability Performance Engineering

Improving the availability, scalability, and performance of distributed systems acrossdevice, edge, and cloud.
Defining or refiningSLIs, SLOs, and error budgetsfor critical services.
Leading initiatives to remove single points of failure, improve resilience, and reduce operational risk.

Operational Excellence

Participating in oncall rotations and contributing to incident response, triage, and post-incident reviews.
Developing automation, runbooks, and selfhealing systems to reduce alert noise and MTTR.
Enhancing operational readiness and supporting incident prevention programs.

Observability Insight

Designing or improving observability systems usingOpenTelemetry,Grafana, and modern signal pipelines.
Building dashboards, analytics, and alerting that illuminate system health and AI service behavior.
Ensuring telemetry is reliable, actionable, and tied to realworld outcomes.

Deployments Change Safety

Improving reliability of CI/CD workflows, including phased rollouts, canaries, shadow testing, and safe rollback mechanisms.
Contributing to the evolution of deployment tooling fordevice+edge+cloudhybrid systems.

Systems Design Collaboration

Influencing architectural decisions by injecting reliability, observability, and operational considerations early in design.
Collaborating with AI/ML engineers, platform engineers, firmware teams, and product partners to deliver robust, dependable user experiences.

Basic Qualifications

8+ yearsof experience inSite Reliability Engineering, Production Engineering, DevOps, or largescale distributed systems operations
Bachelor's Degree in Computer Science, Engineering, or a related technical discipline
Strong experience runningproduction distributed systemsat scale
Proficiencyin at least one modern programming language (e.g., Python, Go, Java, C++)
Strong understanding ofLinux systems, networking fundamentals, and system performance tuning
Experience with monitoring/observability (metrics, logs, tracing)
Handson experience with cloud environments (Azure, AWS, or GCP)
Experience in incident management, oncall rotations, and postmortem processes

Preferred Qualifications

Deep experience withAzurecloud services
Experience withOpenTelemetryfor endtoend instrumentation
Strong familiarity withGrafana, Prometheus, Loki, Tempo, or similar tools
Experience supportingAI/ML systems, model serving, or dataintensive workloads
Background with hybrid architectures (device + edge + cloud)
Experience improving deployment reliability and progressive delivery systems
Passion for automation, reliability engineering, and reducing operational friction

What Success Looks Like

Systems become more observable, reliable, and predictable.
Incidents are resolvedquickly, andfollowup improvements prevent recurrence.
Alerting becomes moreaccurate, actionable, and trusted.
Deployments become safer and more consistent.
Teams move faster because reliability foundations are strong and intuitive.

We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, religion, sexual orientation, gender identity, national origin, status as a veteran, and basis of disability or any federal, state, or local protected class.