Principal Support Engineer \u2013 Operations

Lilly

Hyderabad, India

7-10 Years

Save

Posted 3 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About the Technology Organizationu00A0

Technology at Lilly builds andu00A0maintainsu00A0capabilities using pioneering technologies like most prominent tech companies. What differentiates Technology at Lilly is that we create new possibilities through tech to advance our purpose u2013 creating medicines that make life better for people around the world, like data driven drug discovery and connected clinical trials. We hire the best technology professionals from a variety of backgrounds, so they can bring an assortment of knowledge, skills, and diverse thinking to deliver solutions in every area of our business.u00A0

About the Business Functionu00A0

The Software Product Engineering (SPE) team is au00A0specializedu00A0engineering group that delivers strategic solutions and differentiated capabilities. We take a forward-thinking approach, focusing on an enterprise platform and product mindset, ensuring that the solutions we build can beu00A0leveragedu00A0across Technology teams for broader impact and efficiency.u00A0

Role Summaryu00A0

As a Principal Support Engineer u2013 Operations (R3), you will be the senior technical authority for production support across a suite of products and services. You will leadu00A0complexu00A0incident resolution, drive systemic reliability improvements, and influence operational standards across teams. This role expands beyond advanced troubleshooting to include end-to-end ownership of major incidents, deep technical remediation, automation to reduce operational toil, and mentoring of support engineers. You will partner closely with Engineering, Product, QA, Security, and Platform teams to ensure resilient services, strong operational readiness, and measurable improvements in uptime, latency, and customer experience.u00A0

Whatu00A0Youu2019llu00A0Be Doing (Key Responsibilities)u00A0

1) Advanced Incident Leadership & Resolutionu00A0

Act as the final escalation point for the most complex, high-impact production issues spanning frontend, backend, integrations, data stores, and cloud infrastructure.u00A0

Lead major incident response (swarming/war-room execution), including triage strategy, technical direction, and recovery coordination across multiple teams.u00A0

Drive consistent incident execution aligned with incident management expectations (escalation, outage/deviation considerations, andu00A0appropriate stakeholderu00A0visibility).u00A0

2) Problem Management, RCA, and Defect Eliminationu00A0

Own and drive Root Cause Analysis (RCA) for recurring and severe incidentsu00A0identifyu00A0systemic failure patterns and champion long-term fixes over workarounds.u00A0

Partner with engineering to translate RCA outcomes into durable changes (code, configuration, architecture, monitoring, or process), and track fixes to closure with measurable reliability impact.u00A0

3) Reliability Engineering & Operational Excellenceu00A0

Lead initiatives to improve availability, performance, scalability, and operational resilience (e.g., reducing MTTR, improving detection, reducing repeat incidents).u00A0

Define and implement operational guardrails: readiness checks, runbooks, rollback patterns, post-release validation, and shift-left operational readiness with Dev/QE.u00A0

Contribute to or lead stabilization work consistent with engineering/SRE responsibilities (reliability improvements, defect elimination, major-incident swarming).u00A0

4) Observability, Monitoring & Automationu00A0

Design and evolve observability across logs/metrics/traces improve signal quality (actionable alerts, noise reduction, meaningful dashboards).u00A0

Build automation for common operational tasks (triage, remediation, reporting), using scripting and tooling to reduce manual effort and improve consistency.u00A0

5) Deployment & Change Supportu00A0

Provide senior support for deployments/releases: risk assessment, go/no-go input, rollback readiness, and rapid response for post-release issues.u00A0

Improve CI/CD operational safety through better validation, monitoring hooks, and release checklists in partnership with DevOps/Platform teams.u00A0

6) Compliance, Security & Regulated Environment Readinessu00A0

Ensure support processes and fixes align with internal standards and external regulations (e.g., GDPR, HIPAA where applicable).u00A0

Promote secure operational practices: least privilege, auditability, secure debugging, andu00A0appropriate handlingu00A0of sensitive data during incident response.u00A0

7) Knowledge Leadership & Mentoringu00A0

Raise the operational bar by creating and governing high-quality runbooks, knowledge base articles, and operational standards ensure reusability and adoption across teams.u00A0

Mentor L2/R2 engineers: technical coaching, incident handling patterns, RCA quality, and effective cross-team collaborationu2014acting as a role model for knowledge sharing.u00A0

How You Will Succeed (Success Profile)u00A0

At R3, success is measured not only by resolving incidents, but by preventing them, improving reliability at scale, and influencing standards across teams:u00A0

Be a recognized technical expert who solves complex problems and introduces improved methods/approaches for operations and reliability.u00A0

Lead technical decisions during incidents and influence operational standards, technical direction, and cross-team alignment.u00A0

Demonstrate strong systemsu00A0thinkingu00A0understandu00A0failure modes across distributed services, data stores, networks, and cloud infrastructure.u00A0

Drive measurable outcomes (examples): reduced repeat incidents, improved alert quality, lower MTTR, improved SLO attainment, reduced manual toil.u00A0

Communicate crisply under pressure,u00A0facilitatingu00A0fast alignment between engineering, product, and stakeholders during major incidents.u00A0

What You Should Bring (Qualifications)u00A0

Requiredu00A0

7u201310 years of experience in application support, production engineering, SRE, or software engineering with strong operations ownership (including high-severity incident response).u00A0

Deep hands-on debugging across web applications (frontend + backend), integrations, and production environments.u00A0

Strong experience with incident management and ticketing workflows (e.g., ServiceNow, Jira), including major incident execution and RCA.u00A0

Strong knowledge of RESTful APIs, databases (e.g., PostgreSQL), caching/data stores (e.g., Redis), and cloud platforms (AWS/Azure/GCP).u00A0

Expertiseu00A0in monitoring/logging/alerting stacks (e.g., CloudWatch, ELK, Datadog, Splunk/AppDynamics or equivalent) and the ability to build actionable observability.u00A0

Advanced scripting/automation capability (e.g., Bash, Python, JavaScript) to reduce toil and standardize response.u00A0

Experience supporting products in regulated industries working knowledge of privacy/security expectations and secure handling.u00A0

Strong collaboration and communication skills across Dev, QA, Product, Security, and platform teams.u00A0

Preferred / Nice to Haveu00A0

Experience defining and operationalizing SLIs/SLOs, error budgets, and reliability reporting (SRE ways of working).u00A0

Experience with containerization and deployment patterns (Docker/Kubernetes/ECS), CI/CD systems, and infrastructure-as-code concepts.u00A0

Demonstrated mentoring/leadership: raisingu00A0the capabilityu00A0of teams through coaching and standards.u00A0

u00A0

Additional Informationu00A0

Availability to work flexible work hours is/may beu00A0required. This team will support continuous operations across two shifts and therefore, this role will require non-standard work hours, and some work on weekends and holidays.u00A0u00A0Appropriate adjustments in benefits will be provided for employees working non-standard hours where applicableu00A0

Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form () for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.

Lillyu00A0does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.

#WeAreLilly

More Info

Job Type:

Industry:

Function:

Employment Type:

About Company

LillyJob Source: careers.lilly.com

At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We\u2019re looking for people who are determined to make life better for people around the world.

Job ID: 144554865

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Last Updated: 18-03-2026 06:33:43 PM

Homejobs in Hyderabad / Secunderabad, TelanganaPrincipal Support Engineer \u2013 Operations

Do you want to see more relevant and perfect job for you?

Beware of Scammers

We don’t charge any money for job offers

What it feels like to have

48% more interview calls?

To get 5X more recruiter views on your profile