Senior DevOps / Platform Engineer - AI Platform
Years of Experience: 10+ years
Candidate Current Location: Bangalore only
Job Location: Bangalore
Notice Period: Immediate to 30 days
About Our Team
- Lenovo is building Quantum, a next‑generation hybrid AI platform that spans Windows, Android, and cloud. As part of this vision, we are expanding the engineering organization supporting Qira, Lenovo's cross‑device Personal AI.
- We are hiring Senior DevOps / Platform Engineers to build and operate the core automation, infrastructure, and service platforms that enable secure, reliable, and high‑velocity delivery of Qira's AI systems across device, edge, and cloud.
- Depending on your background, you may be aligned to Platform Engineering, Observability, Operations, or Service Reliability—based on experience and organizational need.
- Qira operates with the speed, ownership, and creativity of a startup, supported by the scale, resources, and technical depth of Lenovo. We are building foundational systems from the ground up—intentionally, pragmatically, and with a culture of engineering excellence.
What You Might Work On
- As a Senior DevOps / Platform Engineer, you may be responsible for a subset of the following areas depending on team placement:
CI/CD, Automation & Tooling
- Designing, implementing, and improving CI/CD pipelines for AI, platform, and application teams.
- Building automation and developer tooling to improve productivity and consistency.
- Developing infrastructure‑as‑code for cloud and hybrid environments (Terraform, Bicep, etc.). Platform & Infrastructure Engineering
- Implementing scalable, secure, and resilient infrastructure on Azure and Kubernetes.
- Building and operating hybrid systems spanning device, edge, and cloud compute.
- Enabling reliable platform services that support inference, data pipelines, and high‑performance AI workloads.
Observability & Telemetry
- Implementing and enhancing observability systems using OpenTelemetry, Grafana, Prometheus, Loki, and related tooling.
- Ensuring platform telemetry is accurate, actionable, and tied to performance and reliability outcomes.
- Building dashboards and analytics for service health and operational insight.
Deployment & Release Engineering
- Improving deployment workflows, safety, consistency, and traceability.
- Supporting progressive delivery patterns including canaries, staged rollouts, and automated rollbacks.
- Optimizing CI/CD and deployment tooling for hybrid AI services.
Collaboration & Reliability Culture
- Partnering closely with SRE, AI/ML, security, firmware, and product engineering teams.
- Contributing to system design discussions with a focus on automation, scalability, and operational best practices.
- Helping define and evolve platform engineering standards, patterns, and conventions.
Basic Qualifications :
- 10+ years in DevOps, Platform Engineering, Cloud Engineering, or related fields
- Bachelor's Degree in Computer Science, Engineering, or a related technical field
- Strong experience building and operating infrastructure in Azure, AWS, or GCP
- Proficiency with CI/CD systems, build automation, and deployment pipelines
- Experience with Infrastructure as Code (Terraform, ARM/Bicep, CloudFormation, etc.)
- Strong development or scripting skills (Python, Go, Bash, or similar)
- Hands-on experience with Docker and Kubernetes
- Understanding of observability fundamentals (metrics, logs, tracing)
Preferred Qualifications
- Deep experience with Azure cloud architecture and DevOps tooling
- Strong hands‑on work with OpenTelemetry (instrumentation, pipelines)
- Experience with Grafana, Prometheus, Loki, Tempo, or similar observability tools
- Experience supporting AI/ML workloads or GPU‑accelerated compute environments
- Familiarity with event‑driven systems and operationalizing data pipelines
- Experience contributing to or running on‑call rotations
- Passion for automation, developer experience, and infrastructure reliability at scale
What Success Looks Like
- CI/CD pipelines are fast, stable, and trusted.
- Platform infrastructure becomes more automated, observable, and scalable.
- Telemetry and dashboards provide clear visibility into system health.
- Deployments are consistent, safe, and repeatable.
- Engineering teams move faster thanks to strong platform foundations.
- Qira's hybrid AI platform becomes increasingly reliable, efficient, and easy to operate.