Search by job, company or skills

Latest Jobs, Job Vacancies, Recruitment - foundit Formerly Monster. Search 500,000 + Jobs openings . Apply online IT, Sales, Banking, Fresher, Walk-ins, Part time, Govt jobs, etc. on foundit Formerly Monster. Post your resume now!.
HNM Solutions

Site Reliability Engineer - Lead

HNM Solutions

    Highlights

    Job Description

    More Info

    Recruiter Info

12-15 Years
3 months ago
63 Viewed
5 Applied

Job Description

Site Reliability Engineer - Lead - Minimum of 12 years of experience in IT, with at least 8 years in
monitoring.
The ideal candidate should have a strong background in both software engineering,
Monitoring and operations, with a focus on ensuring the reliability, performance, and
scalability of our web applications.
Skills
Strong understanding of Modern single page web applications with
Angular/React, NodeJS etc and mobile applications.
Deep knowledge of monitoring and observability tools (e.g., Dynatrace,
Prometheus, Grafana, ELK stack, Datadog, AppDynamics, New Relic,
etc.)
Familiarity with configuration management tools (Ansible, Puppet, etc.)
and shell scripting
AWS Cloud: VPC, subnets, network access control lists, security groups,
EC2 instances, S3 buckets, IAM, Route 53, Lambda.
Experience in Containerization tools like Docker, VM, Kubernetes.
Strong knowledge towards SRE Principles into implementing
monitoring.
Responsibilities:
1. Monitoring and Alerting:
Implement and manage monitoring solutions to track the health and
performance of services.
Proactively monitor application stability.
Set up alerting and automated responses to minimize downtime.
Perform root cause analysis and manage incidents for issue resolution.
Monitor system performance, identify bottlenecks, and collaborate on
optimizations.
2. Service Reliability:
Ensure the reliability and availability of our web applications by setting
and meeting Service Level Objectives (SLOs).
Collaborate with development teams to improve the overall reliability
of applications and services.
3. Automation:
Develop and maintain automation scripts and tools for repetitive
operational tasks.
4. Product Continuous Improvement
Maintain open communication with the Product Owner for product
alignment.
Ensure SRE tasks align with the product's strategic goals.
Internal
Internal
Participate in backlog refinement meetings to prioritize SRE-related
work items.
Identify, document, and communicate defects and improvement
opportunities.
5. Capacity Planning:
Conduct capacity planning to ensure that systems can handle expected
loads.
Analyze data and predict future resource requirements, scaling systems
as needed.
6. Incident Response:
Participate in an on-call rotation to respond to incidents and outages
promptly.
Follow incident management procedures and conduct post-incident
reviews.
7. Change Management:
Assess risks associated with changes to the production environment.
Coordinate and execute deployments, ensuring rollback plans are in
place.
8. Performance Analysis:
Analyze performance bottlenecks and work on optimizing systems for
efficiency and cost-effectiveness.
9. Documentation:
Maintain comprehensive documentation for systems, processes, and
procedures.
10. Collaboration:
Work closely with cross-functional teams, including development,
operations, and security, to achieve common goals.
Foster a culture of reliability within the organization.
11. Other
Execute releases and contribute to the deployment process.
Provide on-call support.
FUNCTION
EDUCATION
Bachelor Of Technology (B.Tech/B.E)
Follow

https://www.hnmsolutions.eu/

User Avatar
0 Active Jobs
57 Followers
Follow
FUNCTIONS
IT/ Software Development - Application Programming/ Maintenance
INDUSTRY
IT/ Computers - Software
SKILLS/ROLES I HIRE FOR
Full Stack Java Developer
LEVEL HIRING FOR
Mid Level
Save
Report
Last Updated: 27-05-2024 11:25:28 AM
Home Jobs in Chennai Site Reliability Engineer - Lead
Beware of Scammers

We don’t charge money for job offers