Search by job, company or skills

Microsoft

Senior Service Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 10 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Overview

Are you excited about working on one of Microsoft's most strategic and highvisibility cloud platforms Do you thrive in High-Impact environments where you can directly influence customer trust, platform reliability, and Microsoft's cloudfirst vision If you're passionate about cloud technologies, love solving complex distributedsystem problems, and excel in fastpaced, engineeringdriven environmentsAzure Engineering Operations (EngOps) is where you belong.

Azure is the foundation of Microsoft's cloud strategy, powering missioncritical workloads for customers across the globe. Ensuring reliability, rapid mitigation, and high customer confidence is core to Microsoft's promise. As part of Azure EngOps, you will play a pivotal role in strengthening the reliability of Microsoft's cloud, driving rapid incident response, and delivering engineeringbacked improvements that directly shape the customer experience across millions of users. This is a unique opportunity to operate like a startup within Microsoftlean, agile, deeply technical, and customerobsessedwhile impacting one of the largest cloud platforms on the planet. You'll partner closely with engineering teams, PMs, field roles, and strategic enterprise customers to help them get healthy, stay healthy, and succeed on Azure.

If you want to influence platform reliability on a global scale and build a career with longterm engineering depth and leadership visibility, Azure EngOps is the team for you.

We are looking for a Sr. Azure Incident Manager with deep expertise in livesite management, strong command-and-control leadership, and the ability to drive clarity during highseverity incidents across Azure's complex, distributed ecosystem. This role sits within Azure Engineering Operations (ENGOPS) and plays a pivotal part in safeguarding customer trust by accelerating detection, response, resolution, and systemic improvements

Responsibilities

  • Lead highseverity Azure incidents with strong command presence and clear decisionmaking under pressure.
  • Drive the endtoend incident lifecycle, including detection, triage, mitigation, communication, and postincident learning.
  • Partner across Azure product groups, EngOPS, and field teams to accelerate diagnosis, reduce timetomitigation, and drive sustainable fixes.
  • Represent the voice of the customer by surfacing systemic issues, platform gaps, and reliability risks to engineering teams.
  • Drive operational maturity through repeatable processes, strong governance, highquality execution, and measurable reliability metrics.
  • Identify livesite patterns and hotspots across services and lead crossteam action plans to address them.
  • Convert customer and incident pain points into automation, AIassisted workflows, and process improvements.
  • Lead or coown pilots, proofsofconcept, and tech accelerators that enhance incident response velocity and quality.
  • Contribute to internal playbooks, frameworks, and tooling that leverage AI/ML for improved livesite management.
  • Handson experience building or contributing to AI/ML solutions is required; Python experience is a plus.

Qualifications

Qualifications

  • 10+ years of experience in incident management, service engineering, program management, or related technical roles.
  • Strong track record commanding high-pressures, complex, cross-team incidents across cloud or large-scale distributed systems.
  • 5+ years of hands-on experience working with cloud technologies (Azure preferred).
  • Strong understanding of Azure architecture, core services, and internal operational workflows.
  • Exceptional communication skills, with the ability to simplify complex technical issues for senior executives and customers.
  • Experience collaborating in matrixed engineering environments with diverse stakeholders (PG, EngOPS, Field, GPMs, PMs, SREs).
  • Strong analytical skills; ability to drive insight from data and influence direction through evidence.
  • Proven experience driving pilots, building prototypes, or contributing to innovation in livesite or automation scenarios.
  • Demonstrated experience in AI/ML-based solutionsautomation, anomaly detection, NLP, or reliability tooling. Exposure to Power BI, Kusto (KQL), or other analytical tooling.
  • Relevant technical degree (CS, Engineering, IT) or equivalent experience.
  • Experience with scripting or automation using Python is preferred.

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 139243801

Similar Jobs