About Persistent
We are an AI-led, platform-driven Digital Engineering and Enterprise Modernization partner, combining deep technical expertise and industry experience to help our clients anticipate what's next. Our offerings and proven solutions create a unique competitive advantage for our clients by giving them the power to see beyond and rise above. We work with many industry-leading organizations across the world, including 20 Fortune 50 companies and 4 of the 5 top banks in both the US and India, and numerous innovators across the healthcare ecosystem.
Our disruptor's mindset, commitment to client success, and agility to thrive in the dynamic environment have enabled us to sustain our growth momentum. Persistent has been recognized across top industry platforms for innovation, leadership, and inclusion. We reported $1,654.4M FY26 revenue with 17.4% Y-o-Y growth. We have delivered 24 sequential quarters of growth with $436.0M in Q4 FY26 revenue, up 3.2% Q-o-Q and 16.2% Y-o-Y growth. Our 27,500+ global team members, located in 18 countries, have been instrumental in helping the market leaders transform their industries. We have been recognized as the Fastest Growing IT Services Brand Globally in the 2026 Brand Finance IT Services 25 Report. We named a Leader in the Everest Group Private Equity (PE) Services PEAK Matrix® Assessment 2026 and Software Product Engineering PEAK Matrix® Assessment 2026.
About Position
We are seeking a skilled Site Reliability Engineer to maintain and support a production SaaS application environment hosted on AWS. The ideal candidate will ensure system reliability, scalability, and performance while supporting ongoing production operations.
- Role: SRE Cloud Devops Engineer
- Location: Pune
- Experience: 6 to 12 Years
- Job Type: Full Time Employment
What You'll Do
- Maintain and support a highly available production SaaS environment on AWS
- Manage and optimize services including EKS, Kafka, S3, EC2, VPC, Cassandra, and networking components
- Implement and manage Infrastructure as Code (IaC) using Terraform
- Handle production deployments, upgrades, patching, and release rollouts
- Monitor system performance, troubleshoot issues, and ensure high reliability
- Own backup and disaster recovery strategies, including planning and executing regular DR exercises
- Participate in 24/7/365 on-call support via PagerDuty
- Collaborate across teams with regular overlap in US West and Europe time zones
- Apply DevSecOps best practices and maintain CI/CD pipelines
Expertise You'll Bring
- Based on the job description, the ideal Site Reliability Engineer (SRE) will bring the following expertise and value to the organization:
- Production-Grade AWS Operations
- Deep hands-on experience running and supporting highly available, mission‑critical SaaS platforms on AWS, with strong ownership of uptime, scalability, and performance.
- Advanced AWS & Networking Expertise
- Proven ability to design, manage, and optimize complex AWS environments including VPCs, EC2, S3, EKS, and secure networking architectures for large-scale distributed systems.
- Kubernetes & Distributed Systems Mastery
- Strong operational expertise in EKS, container orchestration, and managing distributed systems such as Kafka and Cassandra in production.
- Infrastructure as Code Leadership
- Expert-level use of Terraform to build, standardize, and govern infrastructure automation, ensuring repeatability, compliance, and reduced operational risk.
- Reliable Release & Deployment Management
- Extensive experience handling production deployments, upgrades, patching, and release rollouts with minimal downtime and well-defined rollback strategies.
- Monitoring, Incident Response & On‑Call Excellence
- Strong background in observability, alerting, and incident management, including participation in 24/7 on-call rotations using tools like PagerDuty and driving post-incident improvements.
- Disaster Recovery & Business Continuity Ownership
- Demonstrated expertise in defining, implementing, and validating backup, disaster recovery, and failover strategies, including leading regular DR drills and resilience testing.
- DevSecOps & CI/CD Enablement
- Hands‑on experience with CI/CD pipelines, GitOps workflows, and DevSecOps best practices, embedding security, reliability, and automation throughout the SDLC.
- Cross‑Regional Collaboration
- Ability to effectively collaborate with global teams, ensuring smooth operations and communication across US West and European time zones.
- Reliability Mindset & Continuous Improvement
- A strong SRE mindset focused on SLIs, SLOs, error budgets, automation, and continuous reliability improvements to support long-term platform growth.
Benefits
- Competitive salary and benefits package
- Culture focused on talent development with quarterly growth opportunities and company-sponsored higher education and certifications
- Opportunity to work with cutting-edge technologies
- Employee engagement initiatives such as project parties, flexible work hours, and Long Service awards
- Annual health check-ups
- Insurance coverage: group term life, personal accident, and Mediclaim hospitalization for self, spouse, two children, and parents
Values-Driven, People-Centric & Inclusive Work Environment
Persistent is dedicated to fostering diversity and inclusion in the workplace. We invite applications from all qualified individuals, including those with disabilities, and regardless of gender or gender preference. We welcome diverse candidates from all backgrounds.
- We support hybrid work and flexible hours to fit diverse lifestyles.
- Our office is accessibility-friendly, with ergonomic setups and assistive technologies to support employees with physical disabilities.
- If you are a person with disabilities and have specific requirements, please inform us during the application process or at any time during your employment
Let's unleash your full potential at Persistent - persistent.com/careers
Persistent is an Equal Opportunity Employer and prohibits discrimination and harassment of any kind.
Ansible,AWS,Terraform