Search by job, company or skills

Impronics Technologies

Site Reliability Engineer (SRE)

new job description bg glownew job description bg glownew job description bg svg
  • Posted 23 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We are seeking a seasoned Site Reliability Engineer (SRE) with a solid background in payment systems and high-availability architectures. The ideal candidate will have hands-on experience managing large-scale, distributed systems in production, with a deep understanding of reliability, scalability, and performance tuning in the financial services or payments industry.

Key Responsibilities

  • Design, build, and maintain scalable, resilient, and secure infrastructure for high-volume payment platforms.
  • Ensure system uptime, reliability, and performance through effective monitoring, alerting, and incident response strategies.
  • Collaborate with software engineering and DevOps teams to implement CI/CD pipelines and improve deployment efficiency.
  • Automate infrastructure management tasks using Infrastructure-as-Code (IaC) tools (Terraform, Ansible, etc.).
  • Proactively identify and mitigate system bottlenecks, failures, and potential points of failure.
  • Manage disaster recovery strategies, failover planning, and performance testing for critical payment services.
  • Work with development teams to ensure services are designed for reliability, scalability, and observability from the ground up.
  • Participate in root cause analysis and post-incident reviews to prevent future outages.

Required Skills & Experience

  • 8+ years of overall experience in infrastructure engineering or SRE roles, with at least 3+ years in the payments/fintech domain.
  • Strong understanding of payment protocols (UPI, IMPS, RTGS, NEFT, SWIFT, etc.) and transaction processing systems.
  • Proven expertise in Linux systems administration, cloud platforms (AWS, GCP, or Azure), and container orchestration (Kubernetes).
  • Solid experience with monitoring/logging tools like Prometheus, Grafana, ELK Stack, Splunk, etc.
  • Proficiency in one or more scripting languages (Python, Shell, Go, etc.) for automation.
  • Experience with incident management, SLAs, and system troubleshooting in high-pressure environments.
  • Familiarity with security and compliance practices in the financial sector (e.g., PCI-DSS, ISO 27001).

Preferred Qualifications

  • Previous experience supporting mission-critical applications in banking or financial services.
  • Exposure to Kafka, Redis, or other real-time streaming and caching technologies.
  • Experience with Site Reliability Engineering principles and implementing SLOs/SLIs.
  • Understanding of the Error Budget (EL) concept and how it ties into availability and release decisions.
  • Experience on any performance testing tool like K6, JMeter, LoadRunner.
  • Familiarity with mocking tools like Mockito, WireMock, Microcks.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 132337283