Search by job, company or skills

Y

Site Reliability Engineer (SRE)

4-8 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago
  • Be among the first 20 applicants
Early Applicant
Quick Apply

Job Description

  • Run the production environment by monitoringavailability and taking a holistic view of system health.
  • Provide predictive insights into the health ofthe system and suggest measures to optimize and safeguard against futureabnormalities.
  • Build software and systems to manage platforminfrastructure and applications.
  • Improve reliability, quality, andtime-to-market of our suite of our cloud and on-prem software solutions.
  • Measure and optimize system performance, withan eye toward pushing our capabilities forward, getting ahead of customerneeds, and innovating for continual improvement.
  • Provide primary operational support andengineering for multiple large-scale distributed infrastructure and relatedapplications.

Must Have Skill:

  • 5+ years of experience and a proven trackrecord of maintaining and supporting large scale infrastructure and cloudsystems.
  • Gather and analyze metrics from operatingsystems as well as applications to assist in performance tuning and faultfinding.
  • Partner with development teams to improveservices through rigorous testing and release procedures.
  • Participate in system design consulting,platform management, and capacity planning.
  • Create sustainable systems and servicesthrough automation and uplifts.
  • Balance feature development speed andreliability with well-defined service-level objectives.
  • In-depth and hands-on knowledge of automation technologies with extensiveexpertise in Terraform or Ansible.
  • In-depth and hands-on knowledge of Linux andMySQL, programming and scripting using Bash, Python/alternate.
  • In-depth knowledge of maintaining any on-premcloud solutions like OpenStack / CloudStack / OpenNebula / vCloud etc.
  • In-depth and hands-on knowledge of containersand container orchestration using Kubernetes.
  • In-Depth and hands on knowledge on anymonitoring system (Prometheus / Nagios / Zabbix / SolarWinds / ManageEngine etc.).Experience of implementing correlation and predictive analysis into monitoringof the systems.
  • Hands on extensive experience of implementing,maintaining high availability systems. Ensuring backup and ensuringbusiness continuity in a seamless manner.
  • Thorough conceptual knowledge of distributedsystems, storage, networking, SDN, SDS.

Good to Have Skill:

  • Knowledge of CloudStack/Citrix CloudPlatformand involvement as an administrator / maintainer / committer / tester / supportengineer.
  • Data centre or ISP experience in a similarrole.
  • Knowledge of GPU based systems, Nvidia BCM,GPU Virtualisation techniques.
  • Worked in supporting AI/ML workloads.

Qualification and Experience:

  • Relevant bachelors degree

More Info

Job Type:
Function:
Employment Type:
Open to candidates from:
Indian

About Company

otta Data Services | Powering Digital Transformation with Scalable Cloud, Colocation, and Managed Services.

Yotta Data Services offers a comprehensive suite of cloud, data center, and managed services designed to accelerate digital transformation for businesses of all sizes. With state-of-the-art infrastructure, cutting-edge AI capabilities, and a commitment to data sovereignty, we empower organisations to innovate securely and efficiently.

Job ID: 129481335

Similar Jobs