
Search by job, company or skills
About the Technology Organizationu00A0
Technology at Lilly builds andu00A0maintainsu00A0capabilities using pioneering technologies like most prominent tech companies. What differentiates Technology at Lilly is that we create new possibilities through tech to advance our purpose u2013 creating medicines that make life better for people around the world, like data driven drug discovery and connected clinical trials. We hire the best technology professionals from a variety of backgrounds, so they can bring an assortment of knowledge, skills, and diverse thinking to deliver solutions in every area of our business.u00A0
About the Business Functionu00A0
The Software Product Engineering (SPE) team is au00A0specializedu00A0engineering group that delivers strategic solutions and differentiated capabilities. We take a forward-thinking approach, focusing on an enterprise platform and product mindset, ensuring that the solutions we build can beu00A0leveragedu00A0across Technology teams for broader impact and efficiency.u00A0
Role Summaryu00A0
As a Principal Support Engineer u2013 Operations (R3), you will be the senior technical authority for production support across a suite of products and services. You will leadu00A0complexu00A0incident resolution, drive systemic reliability improvements, and influence operational standards across teams. This role expands beyond advanced troubleshooting to include end-to-end ownership of major incidents, deep technical remediation, automation to reduce operational toil, and mentoring of support engineers. You will partner closely with Engineering, Product, QA, Security, and Platform teams to ensure resilient services, strong operational readiness, and measurable improvements in uptime, latency, and customer experience.u00A0
Whatu00A0Youu2019llu00A0Be Doing (Key Responsibilities)u00A0
1) Advanced Incident Leadership & Resolutionu00A0
Act as the final escalation point for the most complex, high-impact production issues spanning frontend, backend, integrations, data stores, and cloud infrastructure.u00A0
Lead major incident response (swarming/war-room execution), including triage strategy, technical direction, and recovery coordination across multiple teams.u00A0
Drive consistent incident execution aligned with incident management expectations (escalation, outage/deviation considerations, andu00A0appropriate stakeholderu00A0visibility).u00A0
2) Problem Management, RCA, and Defect Eliminationu00A0
Own and drive Root Cause Analysis (RCA) for recurring and severe incidentsu00A0identifyu00A0systemic failure patterns and champion long-term fixes over workarounds.u00A0
Partner with engineering to translate RCA outcomes into durable changes (code, configuration, architecture, monitoring, or process), and track fixes to closure with measurable reliability impact.u00A0
3) Reliability Engineering & Operational Excellenceu00A0
Lead initiatives to improve availability, performance, scalability, and operational resilience (e.g., reducing MTTR, improving detection, reducing repeat incidents).u00A0
Define and implement operational guardrails: readiness checks, runbooks, rollback patterns, post-release validation, and shift-left operational readiness with Dev/QE.u00A0
Contribute to or lead stabilization work consistent with engineering/SRE responsibilities (reliability improvements, defect elimination, major-incident swarming).u00A0
4) Observability, Monitoring & Automationu00A0
Design and evolve observability across logs/metrics/traces improve signal quality (actionable alerts, noise reduction, meaningful dashboards).u00A0
Build automation for common operational tasks (triage, remediation, reporting), using scripting and tooling to reduce manual effort and improve consistency.u00A0
5) Deployment & Change Supportu00A0
Provide senior support for deployments/releases: risk assessment, go/no-go input, rollback readiness, and rapid response for post-release issues.u00A0
Improve CI/CD operational safety through better validation, monitoring hooks, and release checklists in partnership with DevOps/Platform teams.u00A0
6) Compliance, Security & Regulated Environment Readinessu00A0
Ensure support processes and fixes align with internal standards and external regulations (e.g., GDPR, HIPAA where applicable).u00A0
Promote secure operational practices: least privilege, auditability, secure debugging, andu00A0appropriate handlingu00A0of sensitive data during incident response.u00A0
7) Knowledge Leadership & Mentoringu00A0
Raise the operational bar by creating and governing high-quality runbooks, knowledge base articles, and operational standards ensure reusability and adoption across teams.u00A0
Mentor L2/R2 engineers: technical coaching, incident handling patterns, RCA quality, and effective cross-team collaborationu2014acting as a role model for knowledge sharing.u00A0
How You Will Succeed (Success Profile)u00A0
At R3, success is measured not only by resolving incidents, but by preventing them, improving reliability at scale, and influencing standards across teams:u00A0
Be a recognized technical expert who solves complex problems and introduces improved methods/approaches for operations and reliability.u00A0
Lead technical decisions during incidents and influence operational standards, technical direction, and cross-team alignment.u00A0
Demonstrate strong systemsu00A0thinkingu00A0understandu00A0failure modes across distributed services, data stores, networks, and cloud infrastructure.u00A0
Drive measurable outcomes (examples): reduced repeat incidents, improved alert quality, lower MTTR, improved SLO attainment, reduced manual toil.u00A0
Communicate crisply under pressure,u00A0facilitatingu00A0fast alignment between engineering, product, and stakeholders during major incidents.u00A0
What You Should Bring (Qualifications)u00A0
Requiredu00A0
7u201310 years of experience in application support, production engineering, SRE, or software engineering with strong operations ownership (including high-severity incident response).u00A0
Deep hands-on debugging across web applications (frontend + backend), integrations, and production environments.u00A0
Strong experience with incident management and ticketing workflows (e.g., ServiceNow, Jira), including major incident execution and RCA.u00A0
Strong knowledge of RESTful APIs, databases (e.g., PostgreSQL), caching/data stores (e.g., Redis), and cloud platforms (AWS/Azure/GCP).u00A0
Expertiseu00A0in monitoring/logging/alerting stacks (e.g., CloudWatch, ELK, Datadog, Splunk/AppDynamics or equivalent) and the ability to build actionable observability.u00A0
Advanced scripting/automation capability (e.g., Bash, Python, JavaScript) to reduce toil and standardize response.u00A0
Experience supporting products in regulated industries working knowledge of privacy/security expectations and secure handling.u00A0
Strong collaboration and communication skills across Dev, QA, Product, Security, and platform teams.u00A0
Preferred / Nice to Haveu00A0
Experience defining and operationalizing SLIs/SLOs, error budgets, and reliability reporting (SRE ways of working).u00A0
Experience with containerization and deployment patterns (Docker/Kubernetes/ECS), CI/CD systems, and infrastructure-as-code concepts.u00A0
Demonstrated mentoring/leadership: raisingu00A0the capabilityu00A0of teams through coaching and standards.u00A0
u00A0
Additional Informationu00A0
Availability to work flexible work hours is/may beu00A0required. This team will support continuous operations across two shifts and therefore, this role will require non-standard work hours, and some work on weekends and holidays.u00A0u00A0Appropriate adjustments in benefits will be provided for employees working non-standard hours where applicableu00A0
Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form () for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.
Lillyu00A0does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.
#WeAreLillyAt Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We\u2019re looking for people who are determined to make life better for people around the world.
Job ID: 144554865