Site Reliability Engineer 2

ModMed Technologies India Private Lid

Hyderabad

4-8 Years

This job is no longer accepting applications

Posted 9 months ago

Job Description

Key Responsibilities:

Architect and manage secure, scalable cloud infrastructure and services, focusing on automation, reliability, and proactive cost management to ensure efficient operations.
Implement and refine observability and monitoring solutions using DataDog, ensuring proactive issue identification and efficient resource utilization.
Lead CI/CD pipeline development, maintenance, and optimization with Jenkins, integrating AWS services to enhance development workflows and infrastructure automation.
Drive the containerization and orchestration of applications using Kubernetes, enhancing scalability, deployment efficiency, and cost-effectiveness.
Monitor application and infrastructure performance in AWS, applying tuning and optimizations to ensure optimal resource utilization and user experience while managing costs.
Design and manage disaster recovery and backup strategies on AWS, prioritizing data integrity, system availability, and cost efficiency.
Provide expert troubleshooting and problem-solving across various platforms and applications within AWS, aiming for minimal disruption and quick resolution.
Ensure strict adherence to AWS security standards and compliance with data protection regulations, with a keen eye on cost implications.
Keep abreast of new cloud technologies and trends, recommending and implementing improvements for competitive advantage and cost savings.
Mentor and support junior team members, fostering a culture of learning, collaboration, and cost-consciousness.
Work closely with cross-functional teams to understand requirements and deliver AWS-based solutions that meet business objectives efficiently and cost-effectively

Qualifications:

Bachelor s degree in Computer Science, Information Technology, or related field, or equivalent experience.
A minimum of 3 years of experience in Site Reliability Engineering, Cloud Engineering, or a similar role, with a demonstrated track record of problem-solving in complex, cloud-based environments. This should include extensive experience with designing, implementing, and managing scalable, highly available, and fault-tolerant systems.
Strong expertise in managing cloud environments (preferably in AWS), with hands-on experience in observability platforms such as DataDog.
Proficiency in automation and scripting languages (e.g., Python, Bash) and infrastructure as code (IaC) tools (e.g., Terraform, Ansible).
Extensive experience with CI/CD tools, notably Jenkins, and familiarity with containerization and orchestration technologies like Kubernetes.
Solid understanding of networking, cloud security best practices, performance optimization, and cost management strategies.
Demonstrated commitment to implementing industry-standard site reliability principles and a proactive approach to cost management in daily operations.
Proven leadership skills and the ability to mentor junior team members, guide teams through complex operational challenges, and foster a culture of continuous improvement.
Excellent verbal and written communication skills, with the ability to work effectively in a team environment and communicate complex technical concepts to a non-technical audience.