Search by job, company or skills

C

Senior Site Reliability Engineer

5-8 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago
  • Be among the first 20 applicants
Early Applicant
Quick Apply

Job Description

Key Responsibilities:

  • Lead incident management, monitoring, and alerting processes to ensure timely detection and resolution of production issues.
  • Ensure reliability, availability, and performance of systems by defining and maintaining SLIs, SLOs, and SLAs.
  • Design and implement fault-tolerant, scalable architectures to minimize downtime and improve resiliency.
  • Develop automation and tooling for monitoring, incident remediation, and infrastructure management.
  • Participate in a 24x7 on-call rotation to manage production incidents and maintain system uptime.
  • Create and maintain SOPs and technical documentation for processes, tools, and incident management protocols.
  • Implement and manage Infrastructure as Code (IaC) using tools such as Terraform and Ansible to automate provisioning and deployments.
  • Work with cloud platformsprimarily AWS (EC2, S3, VPC, RDS, EKS, ECS, CloudWatch, CloudFormation)to support scalable system operations.
  • Integrate and manage CI/CD pipelines using tools like Jenkins to enable seamless deployments.
  • Utilize monitoring and alerting tools (Datadog, Site24x7, Grafana, CloudWatch) to proactively identify issues.
  • Conduct performance tuning and optimization, addressing bottlenecks and improving efficiency.
  • Drive cost optimization strategies while maintaining performance and reliability standards.
  • Adhere to security best practices and ensure infrastructure compliance with organizational standards.
  • Collaborate with development, product, and security teams to enhance system reliability and service delivery.
  • Mentor junior engineers and promote a culture of reliability engineering across the organization.

Qualifications:

  • 58 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles.
  • Strong hands-on expertise with AWS (experience with GCP or Azure is a plus).
  • Proficiency in Infrastructure as Code (IaC) tools such as Terraform and Ansible.
  • Experience with monitoring and alerting tools including Datadog, Site24x7, Grafana, and CloudWatch.
  • Solid understanding of CI/CD tools such as Jenkins.
  • Proven ability in incident management, root cause analysis, and implementing long-term reliability improvements.
  • Familiarity with automation scripting (Python, Bash, or Shell scripting preferred).
  • Knowledge of security best practices, networking, and cloud cost management.
  • Excellent problem-solving, analytical, and collaboration skills.
  • AWS certification or equivalent cloud certification is an advantage.

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Indian

About Company

CyberArk is the global leader in Identity Security. Centered on privileged access management, CyberArk provides the most comprehensive security offering for any identity – human or machine – across business applications, distributed workforces, hybrid cloud workloads, and the DevOps lifecycle. The world’s leading organizations trust CyberArk to help secure their most critical assets. For over 25 years, CyberArk has led the market in securing enterprises against cyber attacks that take cover behind insider privileges and attack critical enterprise assets. Today, only CyberArk delivers a new category of targeted security solutions that help leaders stop reacting to cyber threats and get ahead of them, preventing attack escalation before irreparable business harm is done. At a time when auditors and regulators recognize that privileged accounts are the fast track for cyber attacks and demand stronger protection, CyberArk’s security solutions master high-stakes compliance and audit requirements while arming businesses to protect what matters most.

Job ID: 130506623

Similar Jobs