Search by job, company or skills

National E-Governance Division

Cloud Platform & SRE Lead

10-14 Years
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 18 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role: Cloud, Platform & SRE Lead

Experience: 10–14 years

Location: New Delhi (on-site)

About the Role

SIDH runs on AWS across ROSA/OpenShift, ECS/EKS, PostgreSQL, MongoDB, Redis, Elasticsearch, and Kafka, with cloud spend of ₹30+ Cr in FY25. It currently has no multi-region DR plan, inconsistent environments across Dev/UAT/Prod, WAF in monitoring-only mode, and idle resources taking 45–60 days to remediate. The Cloud, Platform & SRE Lead will bring engineering discipline to infrastructure governance, reliability, and cost.

Key Responsibilities

  • Define the target cloud and platform architecture across AWS accounts, container platforms, networking, shared services, observability, and deployment standards
  • Own SRE governance: SLIs/SLOs, error budgets, incident review standards, reliability gates, capacity planning, and service classification
  • Drive remediation of assessed gaps — no multi-region DR, environment version inconsistency, manual deployments, weak WAF enforcement, idle assets, absent retention policies
  • Lead FinOps governance: tagging standards, rightsizing decisions, idle asset elimination, monthly cost reviews, and application tiering
  • Define backup retention, resilience standards, failover/failback testing, and RTO/RPO expectations across all workloads
  • Set standards for observability, logging, metrics, tracing, alerting, and post-incident review
  • Govern infra/platform vendors and validate production readiness for new workloads, integrations, and releases
  • Partner with QA and Security to embed SAST/DAST, patching, and operational controls in the engineering lifecycle

What We Are Looking For

  • Deep expertise in AWS architecture, Kubernetes, Terraform, and observability stacks (Prometheus, Grafana, ELK/OpenSearch) at scale
  • Practical SRE experience defining service reliability models, incident frameworks, and capacity planning disciplines
  • Experience with DR design, backup policy, patch governance, WAF/security control integration, and cloud cost management
  • This is NOT a pure DevOps role — must have architectural ownership and governance experience, not just operational execution

Good to Have

  • Familiarity with ROSA/OpenShift, ECS/EKS, Dynatrace or CloudWatch at scale, and enterprise FinOps practices
  • GovCloud migration experience (MeitY empanelled cloud providers)
  • Experience in programmes running regulated data workloads at national scale

More Info

Job Type:
Industry:
Employment Type:

Job ID: 147202113