Search by job, company or skills

L

Principal Support Engineer \u2013 Operations

new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the Technology Organizationu00A0

Technology at Lilly builds andu00A0maintainsu00A0capabilities using pioneering technologies like most prominent tech companies. What differentiates Technology at Lilly is that we create new possibilities through tech to advance our purpose u2013 creating medicines that make life better for people around the world, like data driven drug discovery and connected clinical trials. We hire the best technology professionals from a variety of backgrounds, so they can bring an assortment of knowledge, skills, and diverse thinking to deliver solutions in every area of our business.u00A0

About the Business Functionu00A0

The Software Product Engineering (SPE) team is au00A0specializedu00A0engineering group that delivers strategic solutions and differentiated capabilities. We take a forward-thinking approach, focusing on an enterprise platform and product mindset, ensuring that the solutions we build can beu00A0leveragedu00A0across Technology teams for broader impact and efficiency.u00A0

Role Summaryu00A0

As a Principal Support Engineer u2013 Operations (R3), you will be the senior technical authority for production support across a suite of products and services. You will leadu00A0complexu00A0incident resolution, drive systemic reliability improvements, and influence operational standards across teams. This role expands beyond advanced troubleshooting to include end-to-end ownership of major incidents, deep technical remediation, automation to reduce operational toil, and mentoring of support engineers. You will partner closely with Engineering, Product, QA, Security, and Platform teams to ensure resilient services, strong operational readiness, and measurable improvements in uptime, latency, and customer experience.u00A0

Whatu00A0Youu2019llu00A0Be Doing (Key Responsibilities)u00A0

1) Advanced Incident Leadership & Resolutionu00A0

  • Act as the final escalation point for the most complex, high-impact production issues spanning frontend, backend, integrations, data stores, and cloud infrastructure.u00A0

  • Lead major incident response (swarming/war-room execution), including triage strategy, technical direction, and recovery coordination across multiple teams.u00A0

  • Drive consistent incident execution aligned with incident management expectations (escalation, outage/deviation considerations, andu00A0appropriate stakeholderu00A0visibility).u00A0

2) Problem Management, RCA, and Defect Eliminationu00A0

  • Own and drive Root Cause Analysis (RCA) for recurring and severe incidentsu00A0identifyu00A0systemic failure patterns and champion long-term fixes over workarounds.u00A0

  • Partner with engineering to translate RCA outcomes into durable changes (code, configuration, architecture, monitoring, or process), and track fixes to closure with measurable reliability impact.u00A0

3) Reliability Engineering & Operational Excellenceu00A0

  • Lead initiatives to improve availability, performance, scalability, and operational resilience (e.g., reducing MTTR, improving detection, reducing repeat incidents).u00A0

  • Define and implement operational guardrails: readiness checks, runbooks, rollback patterns, post-release validation, and shift-left operational readiness with Dev/QE.u00A0

  • Contribute to or lead stabilization work consistent with engineering/SRE responsibilities (reliability improvements, defect elimination, major-incident swarming).u00A0

4) Observability, Monitoring & Automationu00A0

  • Design and evolve observability across logs/metrics/traces improve signal quality (actionable alerts, noise reduction, meaningful dashboards).u00A0

  • Build automation for common operational tasks (triage, remediation, reporting), using scripting and tooling to reduce manual effort and improve consistency.u00A0

5) Deployment & Change Supportu00A0

  • Provide senior support for deployments/releases: risk assessment, go/no-go input, rollback readiness, and rapid response for post-release issues.u00A0

  • Improve CI/CD operational safety through better validation, monitoring hooks, and release checklists in partnership with DevOps/Platform teams.u00A0

6) Compliance, Security & Regulated Environment Readinessu00A0

  • Ensure support processes and fixes align with internal standards and external regulations (e.g., GDPR, HIPAA where applicable).u00A0

  • Promote secure operational practices: least privilege, auditability, secure debugging, andu00A0appropriate handlingu00A0of sensitive data during incident response.u00A0

7) Knowledge Leadership & Mentoringu00A0

  • Raise the operational bar by creating and governing high-quality runbooks, knowledge base articles, and operational standards ensure reusability and adoption across teams.u00A0

  • Mentor L2/R2 engineers: technical coaching, incident handling patterns, RCA quality, and effective cross-team collaborationu2014acting as a role model for knowledge sharing.u00A0

How You Will Succeed (Success Profile)u00A0

At R3, success is measured not only by resolving incidents, but by preventing them, improving reliability at scale, and influencing standards across teams:u00A0

  • Be a recognized technical expert who solves complex problems and introduces improved methods/approaches for operations and reliability.u00A0

  • Lead technical decisions during incidents and influence operational standards, technical direction, and cross-team alignment.u00A0

  • Demonstrate strong systemsu00A0thinkingu00A0understandu00A0failure modes across distributed services, data stores, networks, and cloud infrastructure.u00A0

  • Drive measurable outcomes (examples): reduced repeat incidents, improved alert quality, lower MTTR, improved SLO attainment, reduced manual toil.u00A0

  • Communicate crisply under pressure,u00A0facilitatingu00A0fast alignment between engineering, product, and stakeholders during major incidents.u00A0

What You Should Bring (Qualifications)u00A0

Requiredu00A0

  • 7u201310 years of experience in application support, production engineering, SRE, or software engineering with strong operations ownership (including high-severity incident response).u00A0

  • Deep hands-on debugging across web applications (frontend + backend), integrations, and production environments.u00A0

  • Strong experience with incident management and ticketing workflows (e.g., ServiceNow, Jira), including major incident execution and RCA.u00A0

  • Strong knowledge of RESTful APIs, databases (e.g., PostgreSQL), caching/data stores (e.g., Redis), and cloud platforms (AWS/Azure/GCP).u00A0

  • Expertiseu00A0in monitoring/logging/alerting stacks (e.g., CloudWatch, ELK, Datadog, Splunk/AppDynamics or equivalent) and the ability to build actionable observability.u00A0

  • Advanced scripting/automation capability (e.g., Bash, Python, JavaScript) to reduce toil and standardize response.u00A0

  • Experience supporting products in regulated industries working knowledge of privacy/security expectations and secure handling.u00A0

  • Strong collaboration and communication skills across Dev, QA, Product, Security, and platform teams.u00A0

Preferred / Nice to Haveu00A0

  • Experience defining and operationalizing SLIs/SLOs, error budgets, and reliability reporting (SRE ways of working).u00A0

  • Experience with containerization and deployment patterns (Docker/Kubernetes/ECS), CI/CD systems, and infrastructure-as-code concepts.u00A0

  • Demonstrated mentoring/leadership: raisingu00A0the capabilityu00A0of teams through coaching and standards.u00A0

u00A0

Additional Informationu00A0

Availability to work flexible work hours is/may beu00A0required. This team will support continuous operations across two shifts and therefore, this role will require non-standard work hours, and some work on weekends and holidays.u00A0u00A0Appropriate adjustments in benefits will be provided for employees working non-standard hours where applicableu00A0

Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form () for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.

Lillyu00A0does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.

#WeAreLilly

More Info

Job Type:
Function:
Employment Type:

About Company

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We\u2019re looking for people who are determined to make life better for people around the world.

Job ID: 144554865