Introduction
At IBM Infrastructure & Technology, we design and operate the systems that keep the world running. From high-resiliency mainframes and hybrid cloud platforms to networking, automation, and site reliability. Our teams ensure the performance, security, and scalability that clients and industries depend on every day. Working in Infrastructure & Technology means tackling complex challenges with curiosity and collaboration. You'll work with diverse technologies and colleagues worldwide to deliver resilient, future-ready solutions that power innovation. With continuous learning, career growth, and a supportive culture, IBM provides the opportunities to build expertise and shape the infrastructure that drives progress.
Your Role And Responsibilities
As a Site Reliability Engineering Professional, you will specialize in reliability and resiliency with a mix of knowledge and skills in software and systems. You will be responsible for analyzing business needs, problem determination, advising, designing, building, testing, deploying, and maintaining well-engineered information systems and ecosystems. Your primary responsibilities will include:
- Analyze Business Needs: Analyze business requirements to identify areas for improvement and provide recommendations for enhancing system reliability and resiliency.
- Design and Build Systems: Design, build, test, and deploy changes to ensure the maintenance of well-engineered information systems and ecosystems.
- Problem Determination: Identify and determine the root cause of problems, providing solutions and implementing changes to prevent future occurrences.
- Advise on System Maintenance: Provide expert advice on system maintenance, ensuring that systems are running smoothly and efficiently.
- Deploy and Test Changes: Deploy and test changes to ensure that they meet business requirements and do not negatively impact system reliability.
Preferred Education
Bachelor's Degree
Required Technical And Professional Expertise
- Software and Systems Knowledge: Exposure to software and systems engineering principles, with an understanding of how to design, build, test, and deploy reliable and resilient systems.
- Problem Analysis and Solving: Experience working with problem determination techniques, analyzing complex issues, and providing effective solutions to improve system reliability and resiliency.
- System Maintenance and Deployment: Exposure to system maintenance and deployment practices, including testing and validation to ensure smooth and efficient system operation.
- Business Needs Analysis: Experience working with business requirements analysis, identifying areas for improvement, and providing recommendations for enhancing system reliability and resiliency.
- Information Systems and Ecosystems: Exposure to designing, building, and maintaining well-engineered information systems and ecosystems, with a focus on reliability and resiliency.
Preferred Technical And Professional Experience
- Advanced Scripting Skills: Exposure to advanced scripting languages, such as Python or Perl, to automate system maintenance and deployment tasks.
- Cloud Computing Knowledge: Experience working with cloud computing platforms, including designing and deploying scalable and resilient systems.
- IT Service Management: Exposure to IT service management frameworks, such as ITIL, to ensure alignment with industry best practices for system maintenance and deployment.