Search by job, company or skills

Datavail

Senior Associate Cloud SRE

new job description bg glownew job description bg glownew job description bg svg
  • Posted 11 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description

Job Title: Senior Associate Cloud SRE

Education: Any Graduate

Experience: 48 years

Location: Mumbai (Hybrid Model)

Employment Type: Full-time

Overview

We are seeking a Site Reliability Engineer to deliver tier two cloud operations managed services support across AWS and Azure environments. This role combines advanced troubleshooting and operational excellence with proactive reliability engineering, focusing on maintaining 24x7x365 service availability while continuously improving automation and operational efficiency across multi-cloud infrastructure.

Role Summary

As a Site Reliability Engineer supporting multi-cloud infrastructure (AWS and Azure), you will manage complex operational challenges and escalations while implementing reliability best practices across production systems. You will work collaboratively with customer teams and senior engineers to ensure system stability, automate operational workflows, and maintain comprehensive observability. This is a delivery-focused role requiring both advanced technical execution and operational ownership across cloud platforms.

Primary Responsibilities:

Tier 2 Multi-Cloud Operations & Managed Services:

AWS Operations:

  • Provide 24x7x365 tier two support and escalation handling for AWS environments
  • Execute complex operational tasks including:
  • Patching and managing Amazon Machine Images (AMIs)
  • Creating and configuring EC2 instances and RDS databases
  • Managing IAM roles, users, and policies
  • Configuring S3 bucket policies and Access Control Lists (ACLs)
  • Opening and managing network routes (VPC, subnets, security groups)
  • Restoring snapshots and database backups to lower environments
  • Increasing disk sizes (EBS volumes) and managing storage optimization
  • Implementing proper tagging for environment identification and cost allocation
  • Managing logs archiving using CloudWatch Logs and S3

Azure Operations:

  • Provide equivalent tier two support for Azure cloud environments
  • Execute Azure-specific operational tasks including:
  • Managing and updating Azure Virtual Machine images
  • Creating and configuring Azure Virtual Machines and Azure SQL databases
  • Managing Azure Active Directory (AAD) identities, roles, and role-based access control (RBAC)
  • Configuring Azure Storage account policies and access controls
  • Managing Virtual Networks, Network Security Groups (NSGs), and route tables
  • Restoring VM snapshots and database backups to lower environments
  • Managing disk resizing and Azure Managed Disks optimization
  • Implementing Azure resource tagging and cost management
  • Managing log archiving using Azure Monitor and Log Analytics

Cross-Cloud Responsibilities:

  • Handle escalations from tier one support with deep technical analysis across both platforms
  • Provide root cause analysis for complex incidents in multi-cloud environments
  • Implement consistent operational standards across AWS and Azure
  • Support hybrid cloud connectivity and integration scenarios

Reliability & Incident Management:

  • Implement and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) across AWS and Azure in collaboration with senior engineers and customer stakeholders
  • Lead tier two incident response, performing advanced troubleshooting and resolution on both cloud platforms
  • Conduct thorough post-incident analysis with actionable remediation plans
  • Reduce reactive work by improving runbooks, alert configurations, and standard operating procedures for both clouds
  • Apply reliability engineering best practices with oversight and review
  • Mentor tier one engineers during incident response across multi-cloud scenarios

Automation & Infrastructure as Code:

  • Build and maintain CI/CD pipelines for infrastructure and application deployments on AWS and Azure
  • Automate complex operational tasks including patching, backups, and environment provisioning across both platforms
  • Develop infrastructure automation using Terraform for multi-cloud environments
  • Create sophisticated scripts and tooling to eliminate manual toil and improve operational efficiency
  • Implement Azure Resource Manager (ARM) templates or Bicep for Azure-specific automation
  • Follow established patterns and contribute continuous improvements
  • Document automation processes for knowledge sharing across cloud platforms

Containerization & Deployment:

  • Deploy and operate containerized workloads using Docker on AWS services (ECS, EKS) and Azure services (AKS, Azure Container Instances)
  • Support container reliability through proper health checks, autoscaling configurations, and resource management on both platforms
  • Implement safe deployment patterns (canary deployments, blue/green deployments) across AWS and Azure
  • Troubleshoot complex containerization and orchestration issues in multi-cloud Kubernetes environments
  • Follow and enhance established containerization standards across both cloud providers

Observability & Performance:

  • Configure and maintain comprehensive monitoring, logging, and alerting systems across AWS CloudWatch and Azure Monitor
  • Leverage observability data to identify issues and lead root cause analysis in multi-cloud environments
  • Contribute to performance tuning and cost optimization initiatives across both platforms
  • Ensure proper instrumentation and telemetry across AWS and Azure environments
  • Identify patterns and trends to prevent future incidents
  • Build custom dashboards and reports using CloudWatch, Azure Monitor, and third-party tools (Datadog, Grafana)

Collaboration & Customer Engagement:

  • Work closely with customer development and operations teams to improve system operability across cloud platforms
  • Participate in design reviews and reliability assessments for multi-cloud architectures
  • Communicate technical concepts, tradeoffs, and recommendations clearly to stakeholders
  • Provide regular operational updates and service reports covering both AWS and Azure
  • Act as technical liaison between customers and internal engineering teams

Required Qualifications & Experience:

  • 35 years of hands-on experience in DevOps, SRE, or production operations roles
  • Proven experience operating production systems in AWS OR Azure (deep expertise in one required)
  • Working knowledge or exposure to the secondary cloud platform (ability to learn and support)
  • Demonstrated experience managing containerized applications in production
  • Experience delivering managed services or supporting customer-facing infrastructure
  • Track record of handling complex technical escalations in cloud environments
  • Technical Skills - Primary Cloud Platform (AWS OR Azure)

For AWS-Primary Candidates:

  • AWS Services (Expert): Deep knowledge of EC2, RDS, S3, IAM, VPC, CloudWatch, Lambda, and related services
  • AWS Networking (Expert): Strong experience with VPCs, subnets, security groups, route tables, and VPN/Direct Connect
  • AWS Storage (Expert): Proficiency with EBS, S3, and backup/restore strategies
  • AWS Containers (Expert): Hands-on experience with ECS, EKS, or Fargate
  • Azure (Foundational): Basic understanding of Azure services with willingness to learn; exposure to Azure VMs, Storage, or networking is a plus

For Azure-Primary Candidates:

  • Azure Services (Expert): Deep knowledge of Azure VMs, Azure SQL, Storage Accounts, Azure AD, Virtual Networks, Azure Monitor
  • Azure Networking (Expert): Strong experience with VNets, NSGs, Application Gateway, Azure Firewall, and ExpressRoute
  • Azure Storage (Expert): Proficiency with Managed Disks, Blob Storage, and Azure Backup
  • Azure Containers (Expert): Hands-on experience with AKS (Azure Kubernetes Service) and Azure Container Instances
  • AWS (Foundational): Basic understanding of AWS services with willingness to learn; exposure to EC2, S3, or VPC is a plus

Technical Skills - Cross-Platform (All Candidates):

  • Infrastructure as Code: Proficiency with Terraform (preferred) or CloudFormation/ARM templates
  • CI/CD: Experience building and maintaining automated deployment pipelines (Azure DevOps, GitHub Actions, Jenkins, GitLab CI)
  • Scripting/Programming: Proficiency in Python, PowerShell, Bash, or similar languages
  • Containerization: Strong Docker skills and Kubernetes experience
  • Monitoring & Logging: Experience with cloud-native monitoring tools and/or third-party observability platforms (Datadog, Splunk, ELK, Grafana)
  • Version Control: Proficiency with Git and collaborative development workflows
  • Troubleshooting: Advanced diagnostic and problem-solving capabilities

Operational Capabilities:

  • Experience with 24x7 operations and tier two escalation support
  • Strong troubleshooting and root cause analysis skills
  • Understanding of networking concepts, security best practices, and compliance requirements
  • Familiarity with backup/restore procedures and disaster recovery planning
  • Ability to work under pressure during critical incidents
  • Experience coordinating across distributed teams
  • Willingness and ability to quickly learn the secondary cloud platform

Preferred Qualifications & Certifications:

  • AWS Certifications (for AWS-primary): Solutions Architect Associate, SysOps Administrator, or DevOps Engineer Professional
  • Azure Certifications (for Azure-primary): Azure Administrator Associate (AZ-104) or Azure Solutions Architect Expert (AZ-305)
  • Cloud-agnostic certifications (Terraform Associate, CKA, or SRE Foundation)

Additional Preferred Experience:

  • Any hands-on experience with both AWS and Azure (even if limited in one)
  • Experience with Kubernetes in production environments
  • Prior consulting or managed services provider experience
  • Experience with hybrid cloud or cloud migration projects
  • Experience with configuration management tools (Ansible, Chef, Puppet)
  • Knowledge of security and compliance frameworks (HIPAA, SOC 2, PCI-DSS)
  • Experience in high-traffic or mission-critical industries
  • Experience with cost optimization and FinOps practices

Multi-cloud architecture or implementation experience

About Us

Datavail is a leading provider of data management, application development, analytics, and cloud services, with more than 1,000 professionals helping clients build and manage applications and data via a world-class tech-enabled delivery platform and software solutions across all leading technologies. For more than 17 years, Datavail has worked with thousands of companies spanning different industries and sizes, and is an AWS Advanced Tier Consulting Partner, a Microsoft Solutions Partner for Data & AI and Digital & App Innovation (Azure), an Oracle Partner, and a MySQL Partner.

About The Team

Datavail's Team of Cloud Experts Can Save You Time and Money

Our Cloud experts are capable to overcome every obstacle in helping clients manage everything from databases, analytics, reporting, migrations, and upgrades to monitoring and overall data management.

You can free up your IT resources to focus on growing your business rather than fighting fires. Our Cloud experts can guide you through strategic initiatives or support routine database management.

Cloud Managed Services

Datavail's business focuses on helping you use your data to drive business results through cost-saving services. The success of your business depends on how well you understand and manage your data. Our managed cloud services give you the power to unleash your organization's potential. We provide comprehensive and technically advanced support for Cloud Operation to ensure that your infrastructure is safe, secure, and managed with the utmost level of care.

Our delivery performance in data management leads the industry. We offer highly trained Cloud administrators via a 247, always on, always available, global delivery model.

With the combination of a proven delivery model and top-notch experience ensures that Datavail will remain the Cloud experts on demand you desire. Datavail's flexible and client focused services always add value to your organization.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 142131429