Cloud Platform & SRE Lead

National E-Governance Division

Delhi, India

10-14 Years

Save

Posted 19 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Role: Cloud, Platform & SRE Lead

Experience: 10–14 years

Location: New Delhi (on-site)

About the Role

SIDH runs on AWS across ROSA/OpenShift, ECS/EKS, PostgreSQL, MongoDB, Redis, Elasticsearch, and Kafka, with cloud spend of ₹30+ Cr in FY25. It currently has no multi-region DR plan, inconsistent environments across Dev/UAT/Prod, WAF in monitoring-only mode, and idle resources taking 45–60 days to remediate. The Cloud, Platform & SRE Lead will bring engineering discipline to infrastructure governance, reliability, and cost.

Key Responsibilities

Define the target cloud and platform architecture across AWS accounts, container platforms, networking, shared services, observability, and deployment standards
Own SRE governance: SLIs/SLOs, error budgets, incident review standards, reliability gates, capacity planning, and service classification
Drive remediation of assessed gaps — no multi-region DR, environment version inconsistency, manual deployments, weak WAF enforcement, idle assets, absent retention policies
Lead FinOps governance: tagging standards, rightsizing decisions, idle asset elimination, monthly cost reviews, and application tiering
Define backup retention, resilience standards, failover/failback testing, and RTO/RPO expectations across all workloads
Set standards for observability, logging, metrics, tracing, alerting, and post-incident review
Govern infra/platform vendors and validate production readiness for new workloads, integrations, and releases
Partner with QA and Security to embed SAST/DAST, patching, and operational controls in the engineering lifecycle

What We Are Looking For

Deep expertise in AWS architecture, Kubernetes, Terraform, and observability stacks (Prometheus, Grafana, ELK/OpenSearch) at scale
Practical SRE experience defining service reliability models, incident frameworks, and capacity planning disciplines
Experience with DR design, backup policy, patch governance, WAF/security control integration, and cloud cost management
This is NOT a pure DevOps role — must have architectural ownership and governance experience, not just operational execution

Good to Have

Familiarity with ROSA/OpenShift, ECS/EKS, Dynatrace or CloudWatch at scale, and enterprise FinOps practices
GovCloud migration experience (MeitY empanelled cloud providers)
Experience in programmes running regulated data workloads at national scale