Cloud Engineer

PwC India

Bengaluru, India

5-7 Years

Save

Posted 5 days ago
Be among the first 20 applicants

Early Applicant

Job Description

Experience :- 5

Location :- Bengaluru

Role Title

Azure Site Reliability Engineer (SRE)

Role Summary

We are hiring Azure SREs to engineer reliability at scale across mission-critical workloads in a regulated environment. You will design and operate highly available, secure, and costefficient Azure platforms with a Terraformfirst approach, strong automation, and deep observability. The role includes oncall, incident management, and continuous improvement to reduce toil and improve SLAs/SLOs.

Key Responsibilities

SRE Foundations

Define SLIs/SLOs, manage error budgets, and gate releases based on reliability risk.
Lead oncall rotations, major incident response, and blameless postmortems with action tracking.
Run game days, chaos/resilience drills, and drive toil reduction via automation.

Azure Platform & Governance

Build CAFaligned Landing Zones (hubspoke/Virtual WAN), enforce Azure Policy as Code, tagging, and RBAC/PIM models.
Engineer secure network topologies: Private Link/Endpoints, Azure Firewall/WAF, DDoS, ExpressRoute, Private DNS.

Infrastructure as Code & Automation

Terraform (mandatory): design reusable modules, manage remote state & locking, implement policy checks (e.g., tfsec/Checkov/Conftest).
Implement CI/CD with Azure DevOps/GitHub Actions; automate with Powershell, Azure CLI, Python.
Use Key Vault & workload identity for secretless pipelines; enforce PR reviews and plan/apply gates.

Kubernetes (AKS) Operations

Operate AKS: upgrades (surge), node pool mgmt, HPA/VPA, cluster autoscaler.
Enforce Network Policies, Pod Security, admission control (OPA/Gatekeeper); secure secrets and images.
GitOps (Flux/ArgoCD), hardened ACR, image provenance and supply chain controls.

Observability & AIOps

Build fullstack monitoring with Azure Monitor, Log Analytics, Application Insights, Prometheus/Grafana.
Create KQL dashboards/alerts, enable synthetic monitoring, and correlate traces with OpenTelemetry.
Reduce MTTR using automated runbooks (Functions/Logic Apps/Automation) and optimize log/metrics cost.

Resilience, DR & Backup

Architect HA/DR using Azure Site Recovery (ASR) and region pairs; define & test RTO/RPO.
Operate Azure Backup with immutability/soft delete; enable Key Vault purge protection.
Conduct periodic failover/restore drills with evidence and remediation followups.

Security & Compliance

Implement Zero Trust with Entra ID (RBAC, PIM, Conditional Access), Managed Identities, and leastprivilege.
Enforce baselines with Defender for Cloud; integrate Microsoft Sentinel detections and SOAR playbooks.
Support audits with change control, evidence, and segregation of duties.

Cost & Capacity (FinOps)

Set budgets & alerts, rightsizing, reservations/savings plans, storage tiering.
Optimize observability/storage retention and data flows for cost efficiency.

Required Qualifications

6+ years of overall IT industry experience with at least 5+ years of hands on expertise in Azure Site Reliability Engineering.
Hands-on Terraform (mandatory): module design, state management, pipelines, policy/scanning, drift detection.
Strong Azure infrastructure: compute, storage, networking (hubspoke/vWAN, Private Link, Firewall/WAF, DDoS, ExpressRoute).
AKS operations and container security fundamentals.
Observability: Azure Monitor, App Insights, KQL, Prometheus/Grafana; SLO dashboarding.
DR/Backup expertise: ASR, Azure Backup, RTO/RPO planning and test execution.
Automation proficiency: PowerShell, Azure CLI, Python; Azure Functions/Logic Apps/Automation Accounts.
Identity & security: Entra ID, RBAC/PIM, Key Vault, Defender for Cloud.
Certifications: AZ104 mandatory

Nice to Have

Microsoft Sentinel (detections, hunting, SOAR runbooks).
Chaos Studio, performance/load testing, progressive delivery (Blue/Green, Canary, feature flags).
Data HA/DR across Azure SQL DB/MI, PostgreSQL Flexible Server.
FinOps practices and cost optimization playbooks.
Certifications: AZ305, AZ400, AZ700, AZ500.