Search by job, company or skills

Oracle

Senior Principal Core Infrastructure Engineer

10-12 Years
Save
  • Posted 2 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description

Architects and leads design of interdependent, elastic distributed systems for hyper‑scale performance and reliability. Defines scalability requirements with stakeholders, identifies and removes bottlenecks, and leverages data plane platforms for large‑scale operations. Engineers fault‑tolerant designs that sustain in‑service updates, handle partitions and unreliable networks (load‑shedding, throttling, rate‑limiting), and set SLO‑aligned durability and availability standards. Establishes KPIs and advanced telemetry, formally verifies complex features, and defines replication/synchronization strategies for integrity. Leads critical incident resolution and operational readiness, holding partners to SOPs while mentoring others. Architects advanced security controls and oversees remediation, and drives comprehensive automation (IaC) and change plans enabling safe, automated patching, updates, and rollbacks.

Responsibilities

Key Responsibilities

System Design & Architecture - System Scalability:

  • Leads the architecture and design of interdependent distributed systems, ensuring horizontal and vertical scalability and overall performance including leveraging distributed state management tools.
  • Identifies performance and scalability bottlenecks to optimize code and/or systems for large-scale data processing and high-throughput requirements to improve performance for hyper-scale systems.
  • Collaborates with stakeholders to define system scalability requirements, ensuring the defined requirements meet customer expectations.
  • Designs interdependent systems to scale with elasticity (e.g., effectively scaling both up and down).
  • Leverages and implements data plane platforms for large-scale data operations.
  • Evaluates if systems are meeting nonfunctional scalability requirements, and proactively anticipates system failures to meet the requirements.

System Design & Architecture - System Reliability Design:


  • Architects comprehensive fault-tolerant interdependent systems capable of withstanding in-service updates by implementing advanced redundancy, replication, and automatic failover mechanisms.
  • Leads the design of systems that effectively handle service disruptions (e.g., network partitions by prioritizing consistency, availability, or partition tolerance).
  • Optimizes and design advanced techniques for handling network unreliability, including load-shedding, throttling, and rate-limiting.
  • Designs systems that are durable and adhere to service level objectives (SLOs), driving standards for availability and durability of other computing services within the department.

System Design & Architecture - System Reliability Performance:


  • Defines key performance indicators (KPIs) and telemetry to proactively identify risks, gaps, or cyclical dependencies in running systems.
  • Leads the creation and customization of complex dashboards, telemetry systems, and alerting mechanisms to proactively monitor and ensure optimal system health.

System Design & Architecture - Correctness / Availability:


  • Evaluates if systems are meeting functional and correctness requirements, and identifies improvement opportunities.
  • Formally verifies complex features (e.g., via TLA+) to ensure system design correctness.
  • Develops strategies for data replication and synchronization to maintain data integrity and availability.

Operational Troubleshooting & Incident Management:


  • Leads efforts and mentors others in troubleshooting, diagnosing, debugging, and resolving critical issues in active systems to support ongoing operation.
  • Implements advanced strategies to prevent interruptions, ensuring no maintenance windows are required for customers and users when resolving issues.
  • Maintains expertise in dependencies and owned components and systems to ensure effective troubleshooting and performance.
  • Reviews and approves operational readiness, standard operating procedures, and holds internal partners accountable for meeting those standards.
  • Provides guidance on coordinating operational support rotations, providing expert guidance in incident response and conducting comprehensive root cause investigations.

Compliance & Security:


  • Architects comprehensive security measures to protect data and applications in multi-tenant environments, including advanced encryption and access controls.
  • Develops and oversees the execution of remediation plans to address identified security vulnerabilities.
  • Mentors others to ensures that cloud infrastructure is in compliance with industry standards and regulations and that documentation is up-to-date.

Automation & Change Management:


  • Oversees the development of comprehensive automation tools and scripts (e.g., Infrastructure as Code (IaC)) for cloud infrastructure management.
  • Creates change management plans for patching, updating, and rolling back applications, and designs systems to allow for automation of these processes.

Core Responsibilities


Planning & Execution:

  • Oversees and tracks timelines and/or budgets for large-scale projects or initiatives to ensure timely progress and adherence to requirements. Strategically balances multiple projects and adjusts plans to accommodate shifts in resources or schedules, mitigating risks to project outcomes.

Collaboration & Partnership:


  • Fosters collaboration across the line of business and with external stakeholders to ensure alignment of expectations and strategic objectives. Builds and maintains partnerships with business leaders, stakeholders, and/or customers to address barriers and contribute to organizational success. Drives transparency and inclusivity by actively seeking, listening to, and leveraging diverse perspectives.

Problem Solving:


  • Develops and refines problem-solving strategies and serves as an escalation point for complex issues across multiple projects or teams. Leads the analysis of complex data and/or information to identify patterns and root causes, reviewing recommendations for resolution, and implementing solutions that prevent future issues.

Continuous Learning:


  • Builds expertise within one's area and actively pursues learning opportunities to stay current with the latest industry trends and best practices. Acts as a role model for continuous learning by identifying new areas to grow skills. Applies new knowledge to drive advancement and mentors others to do the same, fostering a culture of continuous learning and knowledge sharing.

Continuous Improvement:


  • Develops and leads efforts to implement ideas that increase the efficiency and effectiveness of processes, protocols, and workflows across teams, as well as evaluates the impact on key stakeholders. Actively encourages team to recommend ideas for improvement and provide feedback on approaches and methods for continued improvement.

Performance and Development:


  • Leverages subject matter expertise to sustain the talent development pipeline by participating in candidate interviews, assessing candidates, and providing hiring recommendations.

Basic Qualifications


  • BS or MS degree in Computer Science or relevant technical field involving coding or equivalent practical experience
  • 10+ years of total experience in software development
  • Demonstrated ability to write great code using Java, GoLang, C#, or similar OO languages
  • Proven ability to deliver products and experience with the full software development lifecycle
  • Experience working on large-scale, highly distributed services infrastructure
  • Experience working in an operational environment with mission-critical tier-one livesite servicing
  • Systematic problem-solving approach, strong communication skills, a sense of ownership, and drive
  • Experience designing architectures that demonstrate deep technical depth in one area, or span many products, to enable high availability, scalability, market-leading features and flexibility to meet future business demands

Preferred Qualifications


  • Experience as technical lead on a large scale cloud service
  • Hands-on experience developing and maintaining services on a public cloud platform (e.g., AWS, Azure, Oracle)
  • Experience working on Kubernetes
  • Knowledge of Infrastructure as Code (IAC) languages, preferably Terraform
  • Strong knowledge of databases (SQL and NoSQL)
  • Strong knowledge of Computer Networking (OSI layers, HTTP, DNS, TCP/IP, DHCP, Routers, Gateways, Subnets, etc.)
  • Knowledge of Linux internals, Linux/Unix troubleshooting skills
  • Familiarity with host virtualization technologies (KVM, Containers, Docker, etc.)
  • Able to effectively communicate technical ideas verbally and in writing (technical proposals, design specs, architecture diagrams and presentations)
  • Experience with hiring, mentorship and raising the talent bar across the organization

Qualifications


Career Level - IC4.5

About Us

Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.

True innovation starts when everyone is empowered to contribute. That's why we're committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.

We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing [Confidential Information] or by calling 1-888-404-2494 in the United States.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 149630539