SRE DevOps Engineer

Persistent Systems

Pune, India

6-12 Years

Save

Posted 2 days ago
Be among the first 10 applicants

Early Applicant

Job Description

About Persistent

We are an AI-led, platform-driven Digital Engineering and Enterprise Modernization partner, combining deep technical expertise and industry experience to help our clients anticipate what's next. Our offerings and proven solutions create a unique competitive advantage for our clients by giving them the power to see beyond and rise above. We work with many industry-leading organizations across the world, including 20 Fortune 50 companies and 4 of the 5 top banks in both the US and India, and numerous innovators across the healthcare ecosystem.

Our disruptor's mindset, commitment to client success, and agility to thrive in the dynamic environment have enabled us to sustain our growth momentum. Persistent has been recognized across top industry platforms for innovation, leadership, and inclusion. We reported $1,654.4M FY26 revenue with 17.4% Y-o-Y growth. We have delivered 24 sequential quarters of growth with $436.0M in Q4 FY26 revenue, up 3.2% Q-o-Q and 16.2% Y-o-Y growth. Our 27,500+ global team members, located in 18 countries, have been instrumental in helping the market leaders transform their industries. We have been recognized as the Fastest Growing IT Services Brand Globally in the 2026 Brand Finance IT Services 25 Report. We named a Leader in the Everest Group Private Equity (PE) Services PEAK Matrix® Assessment 2026 and Software Product Engineering PEAK Matrix® Assessment 2026.

About Position

We are seeking a skilled Site Reliability Engineer to maintain and support a production SaaS application environment hosted on AWS. The ideal candidate will ensure system reliability, scalability, and performance while supporting ongoing production operations.

Role: SRE Cloud Devops Engineer
Location: Pune
Experience: 6 to 12 Years
Job Type: Full Time Employment

What You'll Do

Maintain and support a highly available production SaaS environment on AWS
Manage and optimize services including EKS, Kafka, S3, EC2, VPC, Cassandra, and networking components
Implement and manage Infrastructure as Code (IaC) using Terraform
Handle production deployments, upgrades, patching, and release rollouts
Monitor system performance, troubleshoot issues, and ensure high reliability
Own backup and disaster recovery strategies, including planning and executing regular DR exercises
Participate in 24/7/365 on-call support via PagerDuty
Collaborate across teams with regular overlap in US West and Europe time zones
Apply DevSecOps best practices and maintain CI/CD pipelines

Expertise You'll Bring

Based on the job description, the ideal Site Reliability Engineer (SRE) will bring the following expertise and value to the organization:
Production-Grade AWS Operations
Deep hands-on experience running and supporting highly available, mission‑critical SaaS platforms on AWS, with strong ownership of uptime, scalability, and performance.
Advanced AWS & Networking Expertise
Proven ability to design, manage, and optimize complex AWS environments including VPCs, EC2, S3, EKS, and secure networking architectures for large-scale distributed systems.
Kubernetes & Distributed Systems Mastery
Strong operational expertise in EKS, container orchestration, and managing distributed systems such as Kafka and Cassandra in production.
Infrastructure as Code Leadership
Expert-level use of Terraform to build, standardize, and govern infrastructure automation, ensuring repeatability, compliance, and reduced operational risk.
Reliable Release & Deployment Management
Extensive experience handling production deployments, upgrades, patching, and release rollouts with minimal downtime and well-defined rollback strategies.
Monitoring, Incident Response & On‑Call Excellence
Strong background in observability, alerting, and incident management, including participation in 24/7 on-call rotations using tools like PagerDuty and driving post-incident improvements.
Disaster Recovery & Business Continuity Ownership
Demonstrated expertise in defining, implementing, and validating backup, disaster recovery, and failover strategies, including leading regular DR drills and resilience testing.
DevSecOps & CI/CD Enablement
Hands‑on experience with CI/CD pipelines, GitOps workflows, and DevSecOps best practices, embedding security, reliability, and automation throughout the SDLC.
Cross‑Regional Collaboration
Ability to effectively collaborate with global teams, ensuring smooth operations and communication across US West and European time zones.
Reliability Mindset & Continuous Improvement
A strong SRE mindset focused on SLIs, SLOs, error budgets, automation, and continuous reliability improvements to support long-term platform growth.

Benefits

Competitive salary and benefits package
Culture focused on talent development with quarterly growth opportunities and company-sponsored higher education and certifications
Opportunity to work with cutting-edge technologies
Employee engagement initiatives such as project parties, flexible work hours, and Long Service awards
Annual health check-ups
Insurance coverage: group term life, personal accident, and Mediclaim hospitalization for self, spouse, two children, and parents

Values-Driven, People-Centric & Inclusive Work Environment

Persistent is dedicated to fostering diversity and inclusion in the workplace. We invite applications from all qualified individuals, including those with disabilities, and regardless of gender or gender preference. We welcome diverse candidates from all backgrounds.

We support hybrid work and flexible hours to fit diverse lifestyles.
Our office is accessibility-friendly, with ergonomic setups and assistive technologies to support employees with physical disabilities.
If you are a person with disabilities and have specific requirements, please inform us during the application process or at any time during your employment

Let's unleash your full potential at Persistent - persistent.com/careers

Persistent is an Equal Opportunity Employer and prohibits discrimination and harassment of any kind.

Ansible,AWS,Terraform