ROLE TITLE: Azure Cloud Infrastructure & Site Reliability Engineer
This is a cloud infrastructure and operations-first role , the candidate must be a technical bridge between software development, QA, and mechanical hardware teams. You will own the Azure cloud environment, drive SRE practices (SLIs/SLOs), and enforce Code-First delivery standards to reduce toil and ensure hardware-software integration stability.
- Primary responsibilities
- Azure Architecture & Hybrid Infrastructure
- Architect and manage scalable Azure environments (App Service, APIM, Cosmos DB, AKS, SQL).
- Manage hybrid connectivity and hardware integration: VNets, NSGs, Private Endpoints, and ExpressRoute/VPN.
- Lead IaC adoption: Maintain production-grade Terraform or Bicep modules with strict state management.
- Own the Code-First lifecycle: Enforce zero-manual-change (Click-Ops) policies in production.
- Coordinate hardware-software integration: Validate compatibility through simulations and real-time testing in collaboration with Mechanical/Operations teams.
- SRE Practices & Incident Management
- Define and maintain SLIs/SLOs for all customer-facing services; publish error budgets monthly.
- Own end-to-end incident management: detection, triage, war room coordination, and blameless post-mortems.
- Track MTTR, MTTD, and change failure rates; present performance trends to engineering/CTO leadership.
- Drive toil reduction: Identify manual operational bottlenecks and automate at least 30% per quarter.
- Monitoring, Observability & Security
- Own the monitoring stack: Azure Monitor, Log Analytics, Application Insights, Prometheus, and Grafana.
- Implement distributed tracing and robust alerting; eliminate noisy false-positive alerts.
- • Enforce security/compliance (ISO, GxP, GDPR): Implement RBAC, Managed Identities, and Azure Key Vault policies.
- Use Microsoft Defender for Cloud and Sentinel to conduct vulnerability assessments and risk mitigation.
- DevOps, Automation & Agile Delivery
- Maintain Azure DevOps (YAML) pipelines for infrastructure, environment refresh, and config drift correction.
- Translate business requirements into technical deliverables and work items in Azure Boards.
- Act as the infrastructure approver for all UAT and Production deployments.
- Scripting & Automation: Develop production-grade automation in Python, Bash, or PowerShell for backup verification, cert rotation, and log archival.
- Collaboration and Support:
- Work closely with development, QA, and operations teams to ensure smooth delivery of applications.
- Provide guidance and support to development teams on best practices for cloud architecture and DevOps.
- Conduct training and knowledge-sharing sessions for team members.
- Documentation and Reporting:
- Maintain comprehensive documentation of infrastructure, configurations, and procedures.
- Generate reports and dashboards to provide visibility into system performance, deployments, and issues.
- Knowledge and understanding of ITIL
- Operating System:
- Windows and Linux Proficiency
- Certifications
- Azure certifications (e.g., Microsoft Certified: Azure Solutions Architect, Azure DevOps Engineer).
- Experience:
- Total 6 + Years
- Proven experience in Azure infrastructure management and DevOps practices.
- Hands-on experience with Azure services (e.g., VMs, AKS, App Services, Azure SQL, Functions, etc.).
- Strong knowledge of CI/CD tools and practices.
- Familiarity with project management tools like Jira, Azure Boards.
- Technical Skills:
- Microsoft Azure Services: Expertise in Azure DevOps (Repos, Pipelines, Boards, Artifacts), Azure Monitor, Application Insights, Key Vault, Security Center, Azure Active Directory, VMs, AKS, App Services, SQL Database, Azure Functions. AZ-104, AZ-400, AZ-305 certification.
- CI/CD and Automation Tools: Strong hands-on experience with Git, Jenkins, GitHub Actions, Azure Pipelines, and automation tools like Ansible.
- PaaS/SaaS Depth: Hands-on experience with Azure PaaS (APIM, Cosmos DB, Functions, AKS) and SaaS service integration
- IaC & Automation: Production-grade Terraform or Bicep; experience with Ansible for configuration management.
- Scripting and Automation: PowerShell, Python, Bash, YAML for task automation and infrastructure management.
- Containerization and Orchestration: Docker, Kubernetes (AKS) for container management and orchestration.
- Monitoring and Logging: Experience with Azure Monitor, Application Insights, Log Analytics, Sentinel for performance monitoring and alerting.
- Security and Compliance: Familiarity with Microsoft Defender for Cloud, Azure Security Center, OWASP, Static Code Analysis tools, and implementing RBAC, secure networking, and identity management.
- Version Control: Git, GitHub, Azure Repos for version control and collaboration.
- Databases: Hands-on experience with Azure SQL, Cosmos DB, MySQL, MSSql.
- Operating Systems: Windows Server and Linux (Ubuntu, Red Hat).
- Networking: configuring Virtual Networks, Load Balancers, VPN Gateways, NSG Rules, and implementing cloud networking best practices.
- DevOps and Agile Methodologies: Strong understanding of agile development, cloud automation, and DevOps practices.
- Soft Skills:
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration abilities.
- Ability to work in a fast-paced, dynamic environment.
- Keep good communication and coordination with cross functional teams.
Internal Interfaces
External Interfaces
- Competency
Technical Competency – T Compass
Leadership Competency – L Compass
Competency
Level
Competency
Level
Azure Cloud Architecture & PaaS/SaaS
Expert
Site Reliability Engineering (SRE)
Expert
IaC (Terraform / Bicep)
Advanced
Azure DevOps (YAML Pipelines)
Advanced
Problem Solving & Incident Analysis
Advanced
Azure Boards -Agile
Advanced
Windows Server Administration
Medium
Linux
Medium
Cloud Security & Compliance (GxP/ISO)
Medium
Scripting (Python / PowerShell / Bash)
Advanced
- Educational and Experience Requirements
Minimum Requirement
Desired
Level of Education
Graduate
Engineering Graduate
Experience
6+ years
6+ Years