Search by job, company or skills

Oracle

Site Reliability Developer 4

6-10 Years
Save
  • Posted 4 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

As a Site Reliability Engineer, you will work with the Production Engineering and SRE teams to own, run, and improve critical healthcare services. You will help keep our cloud-native EHR platforms reliable, secure, scalable, and easy to operate.

You will understand how services are built, deployed, monitored, and supported in production. You will work closely with development teams to improve service design, reduce failures, automate manual work, and improve performance.

You will also help develop and use AI and AIOps to improve operations, including smarter alerting, faster incident detection, automated troubleshooting, and better root cause analysis.

Key Responsibilities

  • Own the reliability, availability, performance, and operations of production services.
  • Support cloud-native EHR platforms built with microservices, Kubernetes, and OCI.
  • Understand service architecture, dependencies, capacity, security, and failure points.
  • Improve monitoring, alerting, observability, and incident response.
  • Use AI, automation, and AIOps to reduce manual work and improve system health.
  • Build tools and scripts for deployment, monitoring, recovery, and operational tasks.
  • Troubleshoot complex production issues and drive them to resolution.
  • Lead root cause analysis for major incidents and help prevent repeat issues.
  • Partner with development teams to improve service design and operability.
  • Create and maintain SOPs, runbooks, dashboards, and knowledge articles.
  • Support migration and modernization of existing hosting environments to OCI.
  • Review code, improve engineering practices, and mentor team members.
  • Work with product, development, support, and cloud teams to deliver reliable healthcare solutions.
  • Participate in 24x7 on-call rotation for critical services.

AI and Automation Focus

  • Design and support AI-driven operational automation.
  • Use AI/AIOps for anomaly detection, alert correlation, and incident insights.
  • Help build self-healing and auto-remediation capabilities.
  • Apply AI safely to improve reliability, supportability, and customer experience.
  • Work with engineering teams to bring applied AI into production operations.

What You Bring

  • 6 to 10 + years of experience with production systems or distributed platforms.
  • Strong experience with Java and scripting using Python or Shell.
  • Good knowledge of microservices, Kubernetes, and cloud platforms.
  • Experience with OCI, AWS, Azure, or GCP.
  • Strong troubleshooting and debugging skills.
  • Experience with monitoring, logging, alerting, and observability tools.
  • Knowledge of REST APIs, JSON/XML, SQL, and secure data handling.
  • Experience with automation, CI/CD, and production deployment.
  • Ability to handle customer-impacting issues and technical escalations.
  • Experience with AI/ML, AIOps, or automation in production is a plus.

Nice to Have

  • Experience with EHR or healthcare platforms.
  • Knowledge of HL7 or FHIR.
  • Oracle Health or New Millennium experience.
  • Oracle Database experience.
  • Strong Kubernetes and OCI experience.

Career Level - IC4

More Info

About Company

Oracle Corporation is an American multinational computer technology corporation headquartered in Austin, Texas.In 2020, Oracle was the second-largest software company in the world by revenue and market capitalization.The company sells database software and technology (particularly its own brands), cloud engineered systems, and enterprise software products, such as enterprise resource planning (ERP) software, human capital management (HCM) software, customer relationship management (CRM) software (also known as customer experience), enterprise performance management (EPM) software, and supply chain management (SCM) software.

Job ID: 149515983