Search by job, company or skills

  • Posted 21 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Title: Site Reliability Engineer (SRE) – Azure Cloud

Experience: 8+ Years

Work Model: 24×7 Shift-based Operations (Rotational shifts, weekends & on-call support)

Location: Remote - (first week at Kochi/Tvm office for training)

Salary as per Industry standards

Key Responsibilities

Azure Infrastructure Management (Mandatory)

  • Manage and support Microsoft Azure infrastructure, ensuring high availability, scalability, and security.
  • Administer Azure services including:
  • Virtual Machines (Windows & Linux)
  • Virtual Networks (VNets), Subnets, NSGs, UDRs
  • Load Balancers, Application Gateways
  • Azure Firewall, VPN Gateways, ExpressRoute
  • Support Azure Active Directory (Entra ID) including RBAC, identity, and access management.
  • Manage Azure Storage services (Blob, File, Disk, Queue, Table).
  • Provide L1/L2 support for Azure PaaS services such as App Services, Azure SQL, Managed Instances, and AKS.
  • Perform capacity planning, performance tuning, and cost optimization.

Monitoring & Observability (Mandatory – Datadog)

  • Perform real-time monitoring using:
  • Datadog (Mandatory)
  • Azure Monitor, Log Analytics, Application Insights
  • Configure alerts, dashboards, and proactive monitoring strategies.
  • Identify system anomalies and ensure rapid incident response.

Networking & Firewall Management (Mandatory)

  • Troubleshoot and manage:
  • TCP/IP, DNS, routing, VPNs
  • WAN/LAN connectivity issues
  • Administer enterprise firewalls such as:
  • Fortinet / FortiGate (preferred)
  • Configure:
  • Site-to-site and client VPNs
  • Firewall policies and routing rules
  • Collaborate with network teams to ensure secure and stable connectivity.

System Administration & Support

  • Manage Windows Server environments including:
  • Active Directory, RDS, file and print services
  • Perform:
  • OS patching, system maintenance, backups, and recovery
  • Provide remote support for customer environments.

Incident Management & Customer Support

  • Take end-to-end ownership of incidents from detection to resolution.
  • Perform root cause analysis (RCA) and implement preventive measures.
  • Handle escalations and provide timely updates to stakeholders.
  • Support 24×7 operations, including on-call responsibilities.
  • Follow ITIL processes for incident, problem, and change management.

Documentation & Collaboration

  • Maintain accurate documentation in ticketing systems.
  • Create and update runbooks, SOPs, and knowledge articles.
  • Collaborate with cross-functional teams to improve reliability and efficiency.

Mandatory Skills

  • Strong hands-on experience in Microsoft Azure Infrastructure
  • Proven experience with Datadog monitoring and observability
  • Strong networking fundamentals (TCP/IP, DNS, VPN, Firewalls)
  • Experience with enterprise firewall technologies (Fortinet/Cisco/Palo Alto)
  • Excellent communication and customer handling skills
  • Strong troubleshooting and analytical abilities

Required Qualifications

  • 8+ years of experience in Cloud / Infrastructure / SRE / System Administration
  • Hands-on experience in Azure cloud support and operations
  • Experience supporting Windows Server environments
  • Strong exposure to incident management and production support

More Info

Job Type:
Industry:
Employment Type:

Job ID: 147375809

Similar Jobs

Thiruvananthapuram / Trivandrum, India

Skills:

GrafanaOpenshiftPythonPostgresPowerShellMicrosoft AzureAzure DevOpsAKSAzure Insights