Search by job, company or skills

D

Platform Engineer (SRE)

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

JD – Platform Engineer (SRE) 

 

Experience: 10+ years in SRE / DevOps / Platform Engineering  

Work Mode: Remote  

Position Type: Contract 

 

Job Summary 

We are seeking a highly experienced Platform Engineer – Site Reliability Engineer (SRE) to design, build, and operate secure, scalable, and highly available cloud-native platforms on Microsoft Azure. The ideal candidate will have deep hands-on expertise in Azure Kubernetes Service (AKS), Infrastructure as Code, automation, and observability, with a strong focus on reliability and operational excellence. This role combines software engineering and infrastructure skills to ensure resilient platforms, frictionless deployments, and a strong developer experience. 

 

Must-Have Skills 

  • 10+ years of experience in SRE / DevOps / Platform Engineering roles with production ownership. 
  • Strong hands-on experience with Microsoft Azure services (networking, compute, storage, security, identity). 
  • Expertise in Azure Kubernetes Service (AKS) – cluster design, administration, troubleshooting, and scaling. 
  • Strong experience with Docker and Kubernetes ecosystem tools (ingress controllers, Helm, etc.). 
  • Proficiency in Infrastructure as Code using Terraform (modules, state management, reusable patterns). 
  • Experience building and maintaining CI/CD pipelines (Azure DevOps preferred; GitHub Actions/Jenkins acceptable). 
  • Strong scripting/programming skills in Python, Go, and/or Bash for automation and tooling. 
  • Solid understanding of Linux fundamentals, networking concepts, and distributed systems. 
  • Proven experience in incident management, root cause analysis, and leading production support. 
  • Hands-on experience implementing observability (logs, metrics, traces, alerts) for platforms and services. 

 

 

Good to Have Skills 

  • Experience designing and operating multi-region, highly available architectures on Azure. 
  • Knowledge of service mesh, ingress, and traffic management patterns in Kubernetes (e.g., Istio, NGINX, API gateways). 
  • Experience with GitOps practices and tools (e.g., Argo CD, Flux). 
  • Familiarity with security and compliance frameworks in enterprise environments. 
  • Azure certifications (e.g., AZ-104, AZ-400, AZ-305, or Kubernetes certifications like CKA/CKAD). 
  • Experience improving developer experience through self-service platforms, templates, and golden paths. 

Roles & Responsibilities 

Platform Engineering & Azure Infrastructure 

  • Design, build, and maintain secure, scalable, cloud-native platforms on Microsoft Azure for production workloads. 
  • Architect, provision, and manage Azure Kubernetes Service (AKS) clusters, including cluster upgrades, scaling, and capacity planning. 
  • Implement Infrastructure as Code (IaC) using Terraform to provision and manage Azure resources consistently and repeatably. 
  • Design and implement multi-region, highly available architectures leveraging native Azure services and best practices. 

Kubernetes & Container Orchestration (AKS Focus) 

  • Deploy, manage, and optimize AKS clusters, ensuring reliability, performance, and cost efficiency. 
  • Enforce Kubernetes best practices including RBAC, network policies, pod security, resource limits/requests, and node management. 
  • Manage containerized workloads using Docker, including image build, optimization, and registry management. 
  • Troubleshoot cluster performance, networking, and workload issues across pods, services, and ingress/egress paths. 

Reliability & Operations 

  • Lead incident response for platform-related issues, including on-call participation, triage, mitigation, and communication. 
  • Conduct post-incident reviews and drive corrective and preventive actions to improve platform reliability. 
  • Implement and continuously improve observability for platform components (metrics, logs, traces, dashboards, alerts). 
  • Ensure high availability, minimal downtime, and scalability for platform services through proactive capacity and reliability planning. 

Automation & DevOps 

  • Build and maintain CI/CD pipelines for infrastructure and application deployments (preferably using Azure DevOps). 
  • Automate infrastructure provisioning, configuration, deployments, and operational runbooks to reduce manual effort and errors. 
  • Implement and evolve self-service platform capabilities for development teams (templates, pipelines, standardized environments). 
  • Work closely with development and security teams to integrate DevOps and SRE best practices across the lifecycle. 

Security & Governance 

  • Implement Azure security best practices, including Managed Identities, Key Vault, RBAC, and network security controls. 
  • Secure AKS clusters using network policies, pod identity, secrets management, and least-privilege access controls. 
  • Ensure compliance with enterprise security, governance, and audit requirements across the platform. 
  • Collaborate with security and compliance teams to address vulnerabilities, harden configurations, and maintain secure baselines. 

More Info

Job Type:
Industry:
Employment Type:

Job ID: 147172819

Similar Jobs

Bengaluru, India

Skills:

JenkinsTerraformAnsibleApache KafkaAws Ec2PythonGitLab CIGitHub ActionsKafka Schema RegistryKafka MirrorMaker

Bengaluru, India

Skills:

AWS EKSPrometheusArtifactoryGrafanaTerraformPythonKubernetesHarborCrossplaneGoVictoriaMetricsGitHub ActionsThanos