Search by job, company or skills

A

Principal Member of technical Staff- TechOps

new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Join us as we work to create a thriving ecosystem that delivers accessible, high-quality, and sustainable healthcare for all.

Role summary:

Help keep athenahealth's platform reliable and scalable by designing and delivering technical operations capabilities that improve resilience, speed, and operational efficiency. This is a Hybrid role based in Bangalore - Whitefield, India, focused on cloud infrastructure operations, automation, and platform migration work across AWS and Kubernetes environments, with strong cross-functional coordination to enable safe and frequent product releases. This role will report to the Director Engineering - Tech Ops.

Team summary:

CE SRE ensures the reliable operation of the platform hosted in a hybrid cloud environment. We partner closely with product teams to scale the platform effectively and enable faster, more efficient product feature releases. InfraOps is a sub-team within CE, focused on operating and scaling the hybrid cloud infrastructure. We are currently undergoing a large-scale migration from a self-managed AWS/Kubernetes/container-based environment to a centrally managed platform. At the same time, we continue to provide 24x7 enterprise-grade operational and user support. To drive efficiency and resilience, we leverage AI-assisted automation, Terraform Enterprise (TFE) for Infrastructure as Code (IaC), and advanced monitoring and alerting capabilities to enhance productivity and operational excellence. In this environment, the team balances near-term operational needs with long-term platform modernization. Work includes improving deployment safety through CI/CD practices, strengthening observability to reduce time-to-detect and time-to-recover, and building repeatable infrastructure patterns that support consistency across services. The team collaborates across engineering, security, and product stakeholders to ensure changes are delivered with appropriate controls, clear communication, and measurable reliability outcomes.

Essential Job Responsibilities:

. Design and implement scalable infrastructure and operational patterns across AWS, Kubernetes, and hybrid cloud environments to improve platform reliability and performance.

. Develop automation and tooling in Python to reduce manual operational work and improve consistency across common workflows.

. Lead infrastructure-as-code practices using Terraform and Terraform Enterprise (TFE), including module standards, change controls, and reusable patterns.

. Drive CI/CD improvements that increase deployment safety, reduce lead time for changes, and support consistent release practices across teams.

. Partner with product and engineering teams to plan and execute platform migration work from self-managed environments to centrally managed platforms, managing dependencies and risks.

. Coordinate cross-functionally with stakeholders to align priorities, communicate operational readiness, and ensure smooth execution of complex initiatives.

. Strengthen operational excellence through improvements to monitoring, alerting, incident response practices, and post-incident corrective actions.

Additional Job Responsibilities:

. Contribute to technical strategy by identifying opportunities to improve resilience, cost efficiency, and operational scalability across the platform.

. Document operational standards, runbooks, and migration playbooks to support consistent execution and onboarding.

. Support evaluation and adoption of AI-assisted automation approaches that improve productivity and reduce repetitive work.

. Review designs and changes for operational readiness, including rollback planning, capacity considerations, and failure-mode analysis.

. Collaborate with security and compliance partners to ensure infrastructure changes align with required controls and policies.

. Facilitate knowledge sharing across teams on infrastructure patterns, CI/CD practices, and incident learnings.

. Assist with prioritization and planning by translating operational needs into scoped projects with clear milestones and measurable outcomes.

Expected Education & Experience (Required):

. 12-18 years of experience in technical operations, site reliability engineering, infrastructure engineering, or related roles supporting production systems.

. Demonstrated hands-on experience with coding, including Python, to build automation and operational tooling.

. Strong experience designing or operating AWS architecture in production environments.

. Very strong Linux Experience

. AWS/Kubernetes Enterprise grade Implementation and Management

. IaC-TFE, Harness or similar for CI/CD, Ansible , Redis as caching, Advanced Alerting and Monitoring such as DataDog, message handling such as Kafka

. Hands on Python scripting experience

. Strong experience of Generative AI Automation such as Windsurf, copilot Claude-Codex

. Proven experience with Kubernetes and container-based platforms, including reliability and scaling considerations.

. Experience with Terraform (including enterprise IaC practices such as module reuse, reviews, and controlled rollouts).

. Experience improving or operating CI/CD pipelines and release processes in collaboration with engineering teams.

. Demonstrated stakeholder management, cross-functional collaboration, and project management experience for complex, multi-team initiatives.

. Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.

Good to Have

. Understanding of object relation DataBase such as Postgres

. Loging, Monitoring & Metrics

. Dashboard

Work split

. Hands on work, coding, Design and implementation a solution, Enterprise grade operational support, might include On-Call on rotational basis

. Stakeholders and Project Management

. Global Team mentorship and management as SPOC for tech and ops lead

-

More Info

Job ID: 143880883