Search by job, company or skills

ACG World

Senior Site Reliability Engineer

Save
  • Posted 10 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

ROLE TITLE: Azure Cloud Infrastructure & Site Reliability Engineer

This is a cloud infrastructure and operations-first role , the candidate must be a technical bridge between software development, QA, and mechanical hardware teams. You will own the Azure cloud environment, drive SRE practices (SLIs/SLOs), and enforce Code-First delivery standards to reduce toil and ensure hardware-software integration stability.

  1. Primary responsibilities

  • Azure Architecture & Hybrid Infrastructure
  • Architect and manage scalable Azure environments (App Service, APIM, Cosmos DB, AKS, SQL).
  • Manage hybrid connectivity and hardware integration: VNets, NSGs, Private Endpoints, and ExpressRoute/VPN.
  • Lead IaC adoption: Maintain production-grade Terraform or Bicep modules with strict state management.
  • Own the Code-First lifecycle: Enforce zero-manual-change (Click-Ops) policies in production.
  • Coordinate hardware-software integration: Validate compatibility through simulations and real-time testing in collaboration with Mechanical/Operations teams.
  • SRE Practices & Incident Management
  • Define and maintain SLIs/SLOs for all customer-facing services; publish error budgets monthly.
  • Own end-to-end incident management: detection, triage, war room coordination, and blameless post-mortems.
  • Track MTTR, MTTD, and change failure rates; present performance trends to engineering/CTO leadership.
  • Drive toil reduction: Identify manual operational bottlenecks and automate at least 30% per quarter.
  • Monitoring, Observability & Security
  • Own the monitoring stack: Azure Monitor, Log Analytics, Application Insights, Prometheus, and Grafana.
  • Implement distributed tracing and robust alerting; eliminate noisy false-positive alerts.
  • • Enforce security/compliance (ISO, GxP, GDPR): Implement RBAC, Managed Identities, and Azure Key Vault policies.
  • Use Microsoft Defender for Cloud and Sentinel to conduct vulnerability assessments and risk mitigation.
  • DevOps, Automation & Agile Delivery
  • Maintain Azure DevOps (YAML) pipelines for infrastructure, environment refresh, and config drift correction.
  • Translate business requirements into technical deliverables and work items in Azure Boards.
  • Act as the infrastructure approver for all UAT and Production deployments.
  • Scripting & Automation: Develop production-grade automation in Python, Bash, or PowerShell for backup verification, cert rotation, and log archival.
  • Collaboration and Support:
  • Work closely with development, QA, and operations teams to ensure smooth delivery of applications.
  • Provide guidance and support to development teams on best practices for cloud architecture and DevOps.
  • Conduct training and knowledge-sharing sessions for team members.
  • Documentation and Reporting:
  • Maintain comprehensive documentation of infrastructure, configurations, and procedures.
  • Generate reports and dashboards to provide visibility into system performance, deployments, and issues.
  • Knowledge and understanding of ITIL
  • Operating System:
  • Windows and Linux Proficiency
  • Certifications
  • Azure certifications (e.g., Microsoft Certified: Azure Solutions Architect, Azure DevOps Engineer).
  • Experience:
  • Total 6 + Years
  • Proven experience in Azure infrastructure management and DevOps practices.
  • Hands-on experience with Azure services (e.g., VMs, AKS, App Services, Azure SQL, Functions, etc.).
  • Strong knowledge of CI/CD tools and practices.
  • Familiarity with project management tools like Jira, Azure Boards.
  • Technical Skills:
  • Microsoft Azure Services: Expertise in Azure DevOps (Repos, Pipelines, Boards, Artifacts), Azure Monitor, Application Insights, Key Vault, Security Center, Azure Active Directory, VMs, AKS, App Services, SQL Database, Azure Functions. AZ-104, AZ-400, AZ-305 certification.
  • CI/CD and Automation Tools: Strong hands-on experience with Git, Jenkins, GitHub Actions, Azure Pipelines, and automation tools like Ansible.
  • PaaS/SaaS Depth: Hands-on experience with Azure PaaS (APIM, Cosmos DB, Functions, AKS) and SaaS service integration
  • IaC & Automation: Production-grade Terraform or Bicep; experience with Ansible for configuration management.
  • Scripting and Automation: PowerShell, Python, Bash, YAML for task automation and infrastructure management.
  • Containerization and Orchestration: Docker, Kubernetes (AKS) for container management and orchestration.
  • Monitoring and Logging: Experience with Azure Monitor, Application Insights, Log Analytics, Sentinel for performance monitoring and alerting.
  • Security and Compliance: Familiarity with Microsoft Defender for Cloud, Azure Security Center, OWASP, Static Code Analysis tools, and implementing RBAC, secure networking, and identity management.
  • Version Control: Git, GitHub, Azure Repos for version control and collaboration.
  • Databases: Hands-on experience with Azure SQL, Cosmos DB, MySQL, MSSql.
  • Operating Systems: Windows Server and Linux (Ubuntu, Red Hat).
  • Networking: configuring Virtual Networks, Load Balancers, VPN Gateways, NSG Rules, and implementing cloud networking best practices.
  • DevOps and Agile Methodologies: Strong understanding of agile development, cloud automation, and DevOps practices.
  • Soft Skills:
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration abilities.
  • Ability to work in a fast-paced, dynamic environment.
  • Keep good communication and coordination with cross functional teams.

Internal Interfaces

External Interfaces

  1. Competency

Technical Competency – T Compass

Leadership Competency – L Compass

Competency

Level

Competency

Level

Azure Cloud Architecture & PaaS/SaaS

Expert

Site Reliability Engineering (SRE)

Expert

IaC (Terraform / Bicep)

Advanced

Azure DevOps (YAML Pipelines)

Advanced

Problem Solving & Incident Analysis

Advanced

Azure Boards -Agile

Advanced

Windows Server Administration

Medium

Linux

Medium

Cloud Security & Compliance (GxP/ISO)

Medium

Scripting (Python / PowerShell / Bash)

Advanced

  1. Educational and Experience Requirements

Minimum Requirement

Desired

Level of Education

Graduate

Engineering Graduate

Experience

6+ years

6+ Years

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 149627481

Similar Jobs

Delhi, Kolkata, Mumbai

Skills:

AgileSoftware Development Life CycleJavascriptSplunkAutomationJIRAPythonProduct managementOperationsMonitoring

Mumbai

Skills:

RDBMSCore JavaLinuxAutomationApplication SupportPython

Mumbai, India

Skills:

ElkCloudformationPrometheusBashGrafanaJenkinsGcpTerraformAnsibleKubernetesPythonAWSOpenTelemetry

Delhi, Kolkata, Mumbai

Skills:

TerraformSaasKubernetesIncident ResponseAI-powered AutomationObservability