If you are someone who:
- Wants to understand what it takes to build a scalable, secure, and reliable service.
- Desires to deepen your technical expertise in all aspects of Site Reliability Engineering, including security, monitoring, automation, development, infrastructure, self-healing, and troubleshooting.
- Is a go-getter with an ownership mindset.
...then we might be the right team for you!
Your Role and Responsibilities
As a Site Reliability Engineer, you will be responsible for:
- Applying a logical, methodical, and analytical approach to isolate and solve technical problems.
- Communicating and collaborating effectively with other technicians, departments, and customers in technical support situations.
- Demonstrating and applying extensive knowledge of the company's products.
- Exercising limited discretion in deviating from standard practice to solve problems within your area of experience.
- Researching problems and recommending solutions.
- Assisting in the provision of on-job training.
- Professionally processing and resolving asset request cases to support proper accounting for site inventory.
- Maintaining site inventory with zero discrepancies through strict adherence to asset management procedures.
- Managing inbound and outbound fulfillment to enable the achievement of business deliverables.
- Ensuring that any asset defects, damages, discrepancies, or deviations are escalated to management for timely support and resolution.
- Working in shift rotations that may include day, evening, overnight, and/or weekends and holidays.
- Working in various IBM Cloud locations in Chennai.
Required Education
Required Technical and Professional Expertise
- 2+ Years of experience including:
- Physical server hardware experience (assembling servers including motherboards, RAM, hard drives, RAID controllers, network cards, etc.). OS experience is a plus, with an emphasis on physical server hardware exposure.
- Scheduling and performing hardware maintenances for IBM Cloud customers involving upgrades, downgrades, support requests, etc. This includes physical hardware upgrades/downgrades referring to server hardware like RAM, hard drives, processors, and network cards.
- Troubleshooting and resolving problems with basic physical network cable/device connections at the server/switch/stack for network devices in the Data Center physically.
- 100% onsite experience working in a physical Data Center (no remote support).
- 24/7 Operations with 100% onsite support (no remote or on-call support). This involves a rotating shift schedule with no specific shifts.
- Responding to UIP-related events around physical infrastructure.
- Coordinating with internal departments to resolve outage events related to faulty links, optics, or failed networking devices.
- Possessing physical infrastructure server knowledge along with a basic understanding of network devices (like network physical cabling, optics, and interconnectivities) to extend onsite support for remote network teams and Network ISP personnel.
- Performing outage events under the supervision and guidance of Site Management.
Preferred Technical and Professional Experience
- May be directed to perform other duties consistent with training and skill levels required for this position.
- Understanding site capacity utilization and providing assistance with Management on capacity planning.