Search by job, company or skills

SWITS DIGITAL Private Limited

Site Reliability Engineer (SRE)

new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago
  • Be among the first 20 applicants
Early Applicant

Job Description

Role: Site Reliability Engineer (SRE)

Location: Hyderabad

Experience: 1015 Years

Job Summary

The Site Reliability Engineer (SRE) will play a critical role in ensuring the reliability, scalability, and performance of Citizens Bank's enterprise systems and cloud environments. The ideal candidate brings deep technical expertise across multi-cloud platforms, automation, observability, and incident management driving reliability engineering practices and operational excellence in a complex financial services environment.

Key Responsibilities

  • Manage and support cloud-based solutions across AWS, Azure, GCP, and other IaaS/PaaS/SaaS/CDN environments.
  • Design, implement, and maintain reliable, scalable, and secure infrastructure, ensuring high availability and performance.
  • Collaborate with DevOps and security teams to implement DevSecOps workflows using Git, Jenkins, Docker, Kubernetes (EKS/AKS).
  • Automate infrastructure and configuration management using Terraform, Ansible, and scripting languages like Python, Bash, or PowerShell.
  • Analyze traffic flows, system logs, and application events to troubleshoot issues and identify interdependencies across systems.
  • Utilize monitoring and observability tools such as DataDog, Splunk, and CloudWatch for proactive system health management.
  • Implement on-call support processes, develop and maintain runbook documentation, and work toward full automation of repetitive tasks.
  • Collaborate with other SREs to build resilient systems and promote Site Reliability Engineering best practices across the enterprise.
  • Handle critical application outages, perform root cause analysis, and drive incident resolution and preventive measures.
  • Work within an Agile environment, partnering with cross-functional teams to continuously improve performance and reliability.

Technical Skills Required

  • Cloud Platforms: AWS, Azure, GCP
  • DevOps/DevSecOps Tools: Jenkins, Git, Docker, Kubernetes (EKS, AKS)
  • Infrastructure as Code (IaC): Terraform, Ansible
  • Monitoring & Logging: DataDog, Splunk, CloudWatch
  • Scripting: Python, Bash, PowerShell
  • Networking: TCP/IP, DNS, HTTP, Load Balancing, Routing
  • OS Environments: Linux, Windows Server
  • Familiarity with AMI builds, patching, and rehydration processes

Core Competencies

  • Strong analytical and troubleshooting skills
  • Proven ability to drive incident response and post-incident reviews
  • Excellent communication and stakeholder management
  • Ability to collaborate in global, distributed teams
  • Focus on automation, resilience, and continuous improvement

More Info

Job Type:
Industry:
Employment Type:

Job ID: 130482167

Similar Jobs