Role Overview:
We are seeking a Senior Cloud Site Reliability Engineer to enhance the reliability, scalability, and performance of our cloud-based products and solutions. You will work in a collaborative environment with operations engineers and software developers to ensure seamless system operations, automate processes, and proactively address challenges.
Key Responsibilities:
- Analyze existing cloud infrastructure and propose scalable, efficient solutions.
- Lead incident management, conducting root cause analysis and implementing preventative measures.
- Develop strategies to improve MTBF (Mean Time Between Failures) and reduce MTTR (Mean Time to Recovery).
- Optimize and automate operational procedures for enhanced system efficiency.
- Monitor and troubleshoot performance issues across infrastructure, software, and networks.
- Research and advocate for emerging cloud technologies and best practices.
- Collaborate with teams to enhance system reliability, architecture, and design.
- Design and execute automated tests to validate software and infrastructure reliability.
Required Qualifications:
- 3+ years of experience in Site Reliability Engineering or a related role.
- 2+ years of hands-on experience with AWS services (AWS certification Solutions Architect or DevOps Engineer is mandatory).
- Strong knowledge of AWS services (EC2, RDS, Lambda, CloudFront, ELB, API Gateway).
- Experience with Linux/Unix and Windows systems, networking, and firewall concepts.
- Proficiency with CI/CD tools (Jenkins, TeamCity) and version control systems (Bitbucket).
- Advanced scripting skills (Python preferred).
- Strong understanding of system reliability, performance tuning, and scalability.
- Experience with cloud-native services, network technologies, and fault-tolerant system design.
- Database expertise in RDBMS and cloud databases (PostgreSQL, MySQL).
- Familiarity with monitoring tools (Splunk, Datadog, or equivalent).
Preferred Qualifications:
- Bachelor's/Master's degree in Computer Science, Engineering, or a related field.
- Experience with big data technologies (Spark, Hadoop, Scala) is a plus.
- Strong problem-solving and analytical skills with a proactive mindset.
- Excellent communication skills and ability to work in global, cross-functional teams.
- Quick adaptability to new platforms, tools, and technologies.
Why Join Us
- Work on cutting-edge cloud solutions with a global impact.
- Be part of a collaborative, high-performing team in an innovative environment.
- Drive mission-critical projects that enhance system reliability and scalability.
- Stay at the forefront of cloud technology with continuous learning and growth opportunities.
Apply Now!