
Search by job, company or skills
Operate and optimize Oracle Database and Exadata environments to meet stringent availability, performance, and scalability targets in 24x7 production.
Lead database reliability engineering initiatives including HA design patterns, capacity planning, demand forecasting, and performance analysis/system tuning.
Deliver advanced performance tuning (SQL optimization, indexing strategies, configuration and storage tuning) and drive measurable improvements in latency, throughput, and stability.
Design and maintain backup, recovery, and disaster recovery strategies validate restore procedures and ensure readiness for mission-critical environments.
Apply SRE best practices including defining SLIs/SLOs, managing error budgets, and improving incident response through post-incident reviews and durable corrective actions.
Build automation and tools (Python/Shell/PowerShell) to eliminate toil, reduce MTTR, improve deployment reliability, and prevent recurring incidents.
Instrument and enhance observability using monitoring/APM stacks (e.g., Prometheus, Grafana, APM) to improve signal quality and reduce alert noise.
Partner with engineering and architecture teams on service and database design, data modeling decisions, and system architecture improvements for distributed systems.
Education: Bachelor's or Master's degree in Computer Science, Engineering, or related field (or equivalent practical experience).
Experience: 6+ years in SRE, Cloud Engineering, DevOps, Database Reliability, or similar production-operations engineering roles.
Oracle Database expertise: Expert hands-on experience with Oracle Database and Exadata administration, high availability architectures, and production operations.
Performance tuning: Demonstrated capability in SQL tuning, indexing strategies, resource utilization analysis, and system tuning for high-scale workloads.
Backup/DR: Proven experience designing and operating backup, recovery, and disaster recovery solutions for 24x7 mission-critical systems.
Automation/scripting: Strong hands-on proficiency in Python and/or Shell/PowerShell for automation, tooling, and operational workflows.
Reliability & distributed systems: Solid understanding of cloud concepts, distributed systems behaviors, and SRE fundamentals (SLIs/SLOs, incident response, RCA).
Operational excellence: Strong troubleshooting, analytical thinking, and clear communication skills comfortable acting as an escalation point during critical incidents.
Good-to-Have
Cloud platforms: OCI preferred AWS/Azure/GCP experience also valuable.
IaC & configuration management: Terraform, Ansible, and Infrastructure-as-Code best practices.
Containers: Kubernetes and Docker exposure in production environments.
Observability depth: Experience with database observability, APM tooling, tracing, and alert quality/noise reduction initiatives.
AI familiarity: Exposure to LLMs, RAG, or AI agents (especially in operational tooling/automation contexts).
Certifications: Oracle Database/Exadata, OCI (or other cloud architect), SRE/DevOps-related certifications.
Self-Assessment Questions
Have I owned production Oracle Database/Exadata environments and successfully improved availability or performance through concrete tuning or architecture changes
Can I confidently diagnose performance issues end-to-end (SQL, indexing, configuration, storage, and workload characteristics) and explain tradeoffs to stakeholders
Have I designed and validated backup/restore and DR processes (including regular testing) for systems that require 24x7 reliability
Do I routinely build automation in Python/Shell/PowerShell to reduce manual operational work, improve MTTR, or prevent recurring incidents
Am I comfortable applying SRE practices (SLIs/SLOs, error budgets, incident response, RCA/postmortems) and driving improvements across teams
Career Level - IC3
Oracle Corporation is an American multinational computer technology corporation headquartered in Austin, Texas.In 2020, Oracle was the second-largest software company in the world by revenue and market capitalization.The company sells database software and technology (particularly its own brands), cloud engineered systems, and enterprise software products, such as enterprise resource planning (ERP) software, human capital management (HCM) software, customer relationship management (CRM) software (also known as customer experience), enterprise performance management (EPM) software, and supply chain management (SCM) software.
Job ID: 143160251