Search by job, company or skills

T

Site Reliability Engineer

4-6 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted a day ago
  • Over 50 applicants
Quick Apply

Job Description

We are looking for a talented and proactive Site Reliability Engineer (SRE) to join our infrastructure and operations team. The ideal candidate will combine software engineering expertise with systems engineering skills to build scalable, reliable, and efficient systems. As an SRE, you will be responsible for ensuring high availability, performance, and reliability of critical systems and applications.

Key Responsibilities:

  • Design, implement, and manage scalable, resilient, and secure infrastructure systems.
  • Monitor, maintain, and improve system reliability, availability, scalability, and performance.
  • Build and enhance CI/CD pipelines using tools like Jenkins, GitLab CI, or Azure DevOps.
  • Develop infrastructure as code using Terraform, Ansible, or similar tools.
  • Automate operational processes and improve system observability through monitoring and alerting.
  • Troubleshoot and resolve production issues across services and technology stacks.
  • Collaborate with development, QA, and DevOps teams to define SLAs, SLOs, and SLIs.
  • Conduct post-incident reviews and develop action plans to prevent recurrence.
  • Participate in on-call rotations and ensure effective incident response.
  • Ensure security, compliance, and best practices are followed in infrastructure and deployments.

Required Skills:

  • 46 years of hands-on experience in Site Reliability Engineering, DevOps, or System Administration roles.
  • Strong proficiency in Linux/Unix administration.
  • Experience with cloud platforms such as AWS, Azure, or GCP.
  • Proficiency in one or more programming/scripting languages (Python, Go, Bash).
  • Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Datadog, ELK, Splunk).
  • Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes).
  • Familiarity with version control systems (e.g., Git).

Preferred Skills:

  • Experience with incident management and root cause analysis.
  • Familiarity with zero downtime deployments and blue-green/canary deployments.
  • Experience in performance tuning, load testing, and resilience engineering.
  • Certification in cloud platforms (AWS/Azure/GCP) is a plus.

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Indian

About Company

Teamware Solutions, a business division of Quantum Leap Consulting Private Limited, offers cutting edge industry solutions for deriving business value for our clients' staffing initiatives. Offering deep domain expertise in Banking, Financial Services and Insurance, Oil and Gas, Infrastructure, Manufacturing, Retail, Telecom and Healthcare industries, Teamware leads its service in offering skills augmentation and professional consulting services.

Job ID: 121758011

Similar Jobs