Job Title : SRE LeadExperience : 8-19 Years
Location :Mumbai / Pune / Bangalore / Hyderabad / Chennai / Delhi / Indore / Nagpur / Bhubaneshwar / Jaipur / Kolkata
Skills : SDLC experience, Devops
Job Description :
Exercise best practices to ensure and improve high availability, reliability, and recoverability of our platforms.
Work with proprietary tools that mitigate weakness in incident management or software delivery.
Maintain disaster recovery and business continuity automation and perform routine DR trials.
Experience in writing software and good understanding on Software Development Life Cycle
Participate in platform management and capacity management practices.
Maintain SLI's adjusting as necessary to properly represent service reliability as service evolve and grow.
Develop, maintain and configure cloud observability systems (e.g., DataDog, Splunk, OpenTelemetry, APM, etc.).
Build flexible monitoring and alerting to proactively address issues before they become incidents.
Identify and address performance issues and optimize system performance.
Partner with development teams to establish application production readiness through rigorous testing and release procedures.
Participate in on-call rotations for incident response and postmortem investigation.
Participates in rigorous training both within and across engineering teams.
Qualification :
Bachelor's degree in Computer Science or related field
10+ years of experience in Site Reliability Engineering or related field
5+ years Experience in Software Development Life Cycle
Proficient in Java programming language
Experience with code reviews for Java applications for performance and reliability
Strong proficiency in one or more scripting language (Go, Python, TypeScript, or shell scripting)
Experience with monitoring and logging tools such as Datadog, Splunk, ELK
Experience with cloud computing platforms such as AWS, Azure, with preference for GCP.
Understanding of Linux and Windows operating systems and networking fundamentals
Understanding of distributed data streaming technologies such as Kafka.
Experience with containerization and orchestration systems such as Docker Swarm, Kubernetes, or Helm
Strong troubleshooting and problem-solving skills
Strong understanding of Networking in cloud.
Strong understanding of DevOps principles and practices
Strong with distributed systems and microservices architecture
Strong with working with git and version control UIs (GitHub and Gitlab).
Works with issue tracking tools, such as Jira.
Shift Timing : 2 pm to 11 pm
Skills Required :
DevOps
Site Reliability Engineering
Software Deployment
Capacity Planning
Code Review
Computer Science
Production Readiness
Shell Scripting
System Performance