Description
Support system reliability, automation, and operational efficiency by developing automation tools, improving monitoring systems, and contributing to infrastructure management.
- The role focuses on reducing manual operational tasks while ensuring high availability and performance of production systems.
- Years of experience 3 to 5 years.
Key Responsibilities
Automation Development :
- Develop and maintain automation scripts, tools, and workflows using technologies such as Python and Bash to automate manual operational tasks.
System Reliability
- Assist in managing service reliability and availability, including monitoring, alerting, and incident response processes.
Infrastructure As Code (IaC)
- Contribute to configuration management and Infrastructure as Code implementations using tools like Ansible, Terraform, and Puppet.
Monitoring & Observability
- Build, tune, and maintain monitoring dashboards and observability systems using tools such as Prometheus, Grafana, and Datadog to ensure system health and performance.
CI/CD Pipeline Maintenance
- Improve and maintain continuous integration and deployment pipelines to streamline application deployments and infrastructure updates.
Troubleshooting & Incident Response
- Participate in on-call rotations to diagnose, troubleshoot, and resolve production incidents efficiently.
What we need
- Bachelors/Masters degree in Computer Science, Information Technology, Engineering, or related field.
- Job Location : Hyderabad, Indore and Ahmedabad (India.
(ref:hirist.tech)