Search by job, company or skills

Oracle

Principal Site Reliability Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 7 days ago
  • Be among the first 20 applicants
Early Applicant

Job Description

Oracle is seeking a motivated Principal Site Reliability Engineer who thrives in a fast-paced, rapidly evolving technology environment. This role requires broad expertise in Linux administration, automation, cloud computing, networking, cloud security, performance analysis, and monitoring to ensure the stability, security, performance, and reliability of infrastructure.

The Site Reliability Engineer will collaborate with multiple service and product teams to identify and resolve cross-team operational risks using strong engineering, troubleshooting, and operational guidance. The role also demands excellent communication and organizational skills, along with close partnership with service owners, engineers, and developers to deliver a superior support experience for the development community.

Responsibilities

  • Drive incident response, root cause analysis (RCA), and remediation efforts reduce repeat incidents through systemic fixes.
  • Own and improve service reliability, availability, performance, and operational readiness across critical systems.
  • Troubleshoot and resolve complex issues across Linux infrastructure and Oracle Cloud Infrastructure (OCI)
  • Serve as the escalation point for critical issues lacking documented procedures and deliver Root Cause Analysis (RCA)
  • Develop a comprehensive understanding of end-to-end configurations, technical dependencies, and characteristics of production infrastructure and services.
  • Quickly adapt to new, fast-changing technologies and incorporate them into automation and operational support.
  • Design and deliver mission-critical automation with strong focus on security, resiliency, scalability, and performance.
  • Create and maintain functional, technical, and SOP documentation.
  • Partner with development teams to define and implement improvements in service architecture.
  • Clearly communicate technical characteristics of services and technologies, guiding cross-functional teams to build and enhance internal tools.

Required Skills

  • 612 years of experience in Linux system administration, kernel-level debugging, and performance tuning.
  • Strong expertise in automation, scripting, and development using Python and Terraform.
  • Proven experience supporting fault-tolerant, highly available, scalable distributed systems and production applications.
  • Skilled in troubleshooting across application, compute, storage, and database layers to improve reliability and availability.
  • Hands-on experience with cloud infrastructure, cloud security, compliance, patching, and operations/problem management.
  • Experience collaborating with global teams and working within Agile environments using tools like Jira.
  • Strong logical thinking, continuous learning mindset, teamwork, and excellent communication skills.

Career Level - IC4

About Company

Oracle Corporation is an American multinational computer technology corporation headquartered in Austin, Texas.In 2020, Oracle was the second-largest software company in the world by revenue and market capitalization.The company sells database software and technology (particularly its own brands), cloud engineered systems, and enterprise software products, such as enterprise resource planning (ERP) software, human capital management (HCM) software, customer relationship management (CRM) software (also known as customer experience), enterprise performance management (EPM) software, and supply chain management (SCM) software.

Job ID: 142749815