Site Reliability Developer 3

Oracle

Bengaluru, India

5-8 Years

Save

Posted 6 days ago
Be among the first 10 applicants

Early Applicant

Job Description

As a Site Reliability Engineer, you will be responsible for defining, deploying, and operating key services with a strong emphasis on system architecture, production operations, capacity planning, performance optimization, deployment, and release engineering. You will help deliver exceptional experiences for our customers and partners while ensuring our services meet reliability, scalability, and performance standards.

Responsibilities

Own the architecture, design, implementation, and production operations of core system and platform services

Improve system reliability through automation, self-healing mechanisms, and real-time monitoring and alerting

Identify and respond to production issues, driving root-cause analysis and implementing preventative solutions

Contribute to the design, development, and operation of platform services, including provisioning, configuration, deployment, and ongoing support

Partner with a globally distributed team to prototype, evaluate, and roll out new platform capabilities

Design, write, and deploy software to improve the availability, scalability, and operational efficiency of services

Develop and evolve standards, architectures, and best practices for large-scale distributed systems

Lead and support capacity planning, demand forecasting, performance analysis, and system tuning

Stay current with emerging technologies and apply innovative approaches to solving complex infrastructure and cloud-service challenges

Qualifications & Experience

5-8 years of experience in Site Reliability Engineering, DevOps, or a closely related role

Experience developing and/or operating large-scale, distributed systems and services

Hands-on experience with containerized environments using Kubernetes, Docker, Mesos, or similar technologies

Experience with infrastructure automation and Infrastructure-as-Code tools such as Terraform, Chef, Ansible, Puppet, or Packer

Familiarity with cloud orchestration frameworks and supporting them in an SRE or production environment

Experience building and maintaining CI/CD pipelines using tools such as Git (or other VCS), GitLab Runners, Jenkins, and Rundeck

Experience supporting production, test, and development environments at medium to large scale

Proficiency in scripting for automation and deployments using Bash, PowerShell, or similar

Knowledge of cloud compute platforms, networking, monitoring, logging, and data processing/analytics

Proficiency in at least one modern programming language such as Python, Go or Java

Experience operating fault-tolerant, highly available, high-throughput, and scalable systems

Hands-on experience with at least one major cloud provider (AWS, OCI, GCP, or equivalent)

Responsibilities

Own the architecture, design, implementation, and production operations of core system and platform services

Improve system reliability through automation, self-healing mechanisms, and real-time monitoring and alerting

Identify and respond to production issues, driving root-cause analysis and implementing preventative solutions

Contribute to the design, development, and operation of platform services, including provisioning, configuration, deployment, and ongoing support

Partner with a globally distributed team to prototype, evaluate, and roll out new platform capabilities

Design, write, and deploy software to improve the availability, scalability, and operational efficiency of services

Develop and evolve standards, architectures, and best practices for large-scale distributed systems

Lead and support capacity planning, demand forecasting, performance analysis, and system tuning

Stay current with emerging technologies and apply innovative approaches to solving complex infrastructure and cloud-service challenges

Qualifications & Experience

5-8 years of experience in Site Reliability Engineering, DevOps, or a closely related role

Experience developing and/or operating large-scale, distributed systems and services

Hands-on experience with containerized environments using Kubernetes, Docker, Mesos, or similar technologies

Experience with infrastructure automation and Infrastructure-as-Code tools such as Terraform, Chef, Ansible, Puppet, or Packer

Familiarity with cloud orchestration frameworks and supporting them in an SRE or production environment

Experience building and maintaining CI/CD pipelines using tools such as Git (or other VCS), GitLab Runners, Jenkins, and Rundeck

Experience supporting production, test, and development environments at medium to large scale

Proficiency in scripting for automation and deployments using Bash, PowerShell, or similar

Knowledge of cloud compute platforms, networking, monitoring, logging, and data processing/analytics

Proficiency in at least one modern programming language such as Python, Go or Java

Experience operating fault-tolerant, highly available, high-throughput, and scalable systems

Hands-on experience with at least one major cloud provider (AWS, OCI, GCP, or equivalent)

understanding of services and technologies.

Career Level - IC3

More Info

Job Type:

Permanent Job

Industry:

IT /Computers - Hardware & Networking

Function:

Site Reliability Engineering

Employment Type:

Full time

About Company

OracleJob Source: careers.oracle.com

Oracle Corporation is an American multinational computer technology corporation headquartered in Austin, Texas.In 2020, Oracle was the second-largest software company in the world by revenue and market capitalization.The company sells database software and technology (particularly its own brands), cloud engineered systems, and enterprise software products, such as enterprise resource planning (ERP) software, human capital management (HCM) software, customer relationship management (CRM) software (also known as customer experience), enterprise performance management (EPM) software, and supply chain management (SCM) software.

Job ID: 138703723

Jobs by Skill - IT