Job Summary
Senior hands-on engineer who leads break-fix and critical-incident response across customer data centers. Fixes the hard problems, runs the bridge during outages, and makes sure failures don't repeat.
Key Responsibilities
- Lead break-fix and critical-incident response on-site—triage, troubleshoot, and resolve.
- Act as a technical lead on Sev-1/major-incident bridges; coordinate field engineers, TAC, OEM vendors, and the customer until it's closed.
- Work hands-on across servers, storage, network hardware, structured cabling (copper/fiber/MPO), power/PDU, and rack & stack.
- Run RCA on major incidents and close the corrective actions so the same fault doesn't recur.
- Support hardware replacements, upgrades, and migrations.
- Document fixes and keep runbooks/SOPs current so the team resolves faster next time.
- Mentor field engineers on tougher diagnostics.
Education & Experience
- 5–8 years of hands-on data center/infrastructure field experience; 3+ years in break-fix (enterprise or hyperscale).
- Strong troubleshooting across server, network, cabling, and power — with good judgment on when to escalate.
- Able to own a Sev-1 bridge under pressure with the customer on the line.
- Working knowledge of ServiceNow / Jira ticketing.
- A degree is helpful, not required—field experience counts.
- CCNA, CDCP, or ITIL Foundation.
- Hyperscale site experience.
Required Skills
- On-call rotation / shift coverage — incidents don't keep business hours.
- 20% travel within the region.
- On-site physical work — lifting and racking equipment, data center floor conditions.
- Site badging / background checks for hyperscale customers.
Performance Metrics
- RCAs completed and corrective actions closed.
- Repeat incidents are reduced on your accounts.
- Escalation speed and quality.
- Runbooks/SOPs were contributed and used.
- Shared team scorecard: MTTR, first-time-fix rate, CSAT.