Search by job, company or skills

ZETA

Site Reliability Engineer I

1-3 Years
Save
  • Posted 13 hours ago
  • Over 50 applicants
Quick Apply

Job Description

Responsibilities

  • System Reliability: Ensuring the reliability of software systems by designing, implementing, and maintaining scalable and reliable infrastructure.
  • Automation: Developing automation tools and scripts to streamline operational tasks, reduce manual intervention, and improve overall system efficiency.
  • Incident Response and Resolution: Monitoring system performance and responding to incidents promptly to minimize downtime and ensure high availability.
  • Capacity Planning: Analyzing system usage patterns and forecasting future capacity needs to ensure that the infrastructure can handle current and future demands.
  • Performance Optimization: Identifying and addressing performance bottlenecks in software systems through optimization and tuning.
  • Infrastructure as Code (IaC): Implementing infrastructure as code practices, using tools like Terraform or Ansible, to define and manage infrastructure in a version-controlled and automated manner.
  • Monitoring and Logging: Implementing and maintaining monitoring and logging solutions to gain insights into system behavior, troubleshoot issues, and proactively address potential problems.
  • On-Call Support: Participating in an on-call rotation to respond to incidents outside of regular working hours and ensure 24/7 system availability
  • Security: Collaborating with security teams to implement and maintain security best practices in infrastructure and application
  • Disaster Recovery Planning: Developing and maintaining disaster recovery plans to ensure that systems can quickly recover from major outages or failures
  • Continuous Improvement: Continuously analyzing system performance, reliability, and incidents to identify areas for improvement and implementing changes to enhance overall system resilience.

Skills

  • Programming Languages: Proficiency in one or more programming languages, commonly Python, Go, Shell, Bash.
  • Automation and Scripting: Strong automation skills using tools like Ansible, Puppet, Chef, or custom scripts. Knowledge of Infrastructure as Code (IaC) tools like Terraform
  • Containerization and Orchestration: Experience with containerization technologies like Docker and container orchestration platforms like Kubernetes.
  • Cloud Computing: Proficiency in any of the cloud platforms such as AWS, Azure, or Google Cloud Platform, and knowledge of managing infrastructure in the cloud.
  • Monitoring and Logging: Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK stack) and logging frameworks to track system performance and troubleshoot issues.
  • Networking: Understanding of networking concepts, protocols, and troubleshooting skills.
  • Security: Knowledge of security best practices, including encryption, access controls, and vulnerability management.
  • Continuous Integration/Continuous Deployment (CI/CD): Understanding and implementation of CI/CD pipelines for automated testing and deployment.
  • Load Balancing: Experience in incident response, troubleshooting, and resolution.
  • Version Control: Proficient use of version control systems like Git.

Experience and Qualifications

  • 1-2 year of experience in site reliability engineering.
  • B.Tech/M.Tech in computer science, information technology or a related field.
  • Having experience working for a product organization is a plus.

 

Role: Site Reliability Engineer

Industry Type: IT Services & Consulting

Department: Engineering - Software & QA

Employment Type: Full Time, Permanent

Role Category: DevOps

Education

UG: Any Graduate

PG: Any Postgraduate

About Company

Job ID: 107480291

Similar Jobs

Bengaluru, India

Skills:

Software EngineeringCloudwatchIncident ManagementDatadogAI-assisted development toolsSRERunbook automationAlert triageInfrastructure-as-codeObservability

Bengaluru, India

Skills:

S3GithubOrchestrationRDSCloud networkingConfiguration managementPrometheusCloudwatchVersion ControlLambdaGitEfsTerraformDynatraceKubernetesAWSCloud migrationsApplication and Infrastructure Delivery automationCI CDEKSEBSStorage Solutions

Bengaluru, India

Skills:

CI CD - GitHubS3OrchestrationRDSAws ServicesCloud networkingConfiguration managementPrometheusCloudwatchLambdaEfsTerraformDynatraceKubernetesScripting – PythonCloud migrationsVersion control – GITApplication and Infrastructure Delivery automationEKSEBSStorage Solutions

Bengaluru, India

Skills:

KubernetesDatadogTerraformAutomation and scriptingContainerized environments

Bengaluru, India

Skills:

Cloud ComputingNetworkingSystem AdministrationRoot Cause Analysis