Role: Cloud, Platform & SRE Lead
Experience: 10–14 years
Location: New Delhi (on-site)
About the Role
SIDH runs on AWS across ROSA/OpenShift, ECS/EKS, PostgreSQL, MongoDB, Redis, Elasticsearch, and Kafka, with cloud spend of ₹30+ Cr in FY25. It currently has no multi-region DR plan, inconsistent environments across Dev/UAT/Prod, WAF in monitoring-only mode, and idle resources taking 45–60 days to remediate. The Cloud, Platform & SRE Lead will bring engineering discipline to infrastructure governance, reliability, and cost.
Key Responsibilities
- Define the target cloud and platform architecture across AWS accounts, container platforms, networking, shared services, observability, and deployment standards
- Own SRE governance: SLIs/SLOs, error budgets, incident review standards, reliability gates, capacity planning, and service classification
- Drive remediation of assessed gaps — no multi-region DR, environment version inconsistency, manual deployments, weak WAF enforcement, idle assets, absent retention policies
- Lead FinOps governance: tagging standards, rightsizing decisions, idle asset elimination, monthly cost reviews, and application tiering
- Define backup retention, resilience standards, failover/failback testing, and RTO/RPO expectations across all workloads
- Set standards for observability, logging, metrics, tracing, alerting, and post-incident review
- Govern infra/platform vendors and validate production readiness for new workloads, integrations, and releases
- Partner with QA and Security to embed SAST/DAST, patching, and operational controls in the engineering lifecycle
What We Are Looking For
- Deep expertise in AWS architecture, Kubernetes, Terraform, and observability stacks (Prometheus, Grafana, ELK/OpenSearch) at scale
- Practical SRE experience defining service reliability models, incident frameworks, and capacity planning disciplines
- Experience with DR design, backup policy, patch governance, WAF/security control integration, and cloud cost management
- This is NOT a pure DevOps role — must have architectural ownership and governance experience, not just operational execution
Good to Have
- Familiarity with ROSA/OpenShift, ECS/EKS, Dynatrace or CloudWatch at scale, and enterprise FinOps practices
- GovCloud migration experience (MeitY empanelled cloud providers)
- Experience in programmes running regulated data workloads at national scale