Search by job, company or skills

Amgen Inc

Site Reliability Engineer

2-6 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 4 hours ago
  • Over 50 applicants
Quick Apply

Job Description

Roles & Responsibilities:

  • Ensure high system reliability and uptime.
  • Develop and maintain monitoring systems.
  • Lead incident response and root cause analysis.
  • Automate repetitive tasks for efficiency.
  • Perform capacity planning and resource scaling.
  • Lead infrastructure as code (e.g., Terraform, Kubernetes).
  • Collaborate with development and operations teams.
  • Maintain clear documentation and share knowledge.
  • Optimize system and application performance.
  • Ensure security and compliance standards are met.
  • Define, measure, and monitor Service Level Objectives (SLOs) and Service-Level Agreements (SLAs) to align with business goals.
  • Drive continuous process and system improvements.
  • Define guidelines, standards, strategies, security policies, and organizational change policies to support the Data Lake.

What we expect of you

Basic Qualifications and Experience:

  • Master's degree in computer science or engineering field and 1 to 3 years of relevant experience OR
  • Bachelor's degree in computer science or engineering field and 3 to 5 years of relevant experience OR
  • Diploma and Minimum of 8+ years of relevant work experience.

Must-Have Skills:

  • Proficiency in programming/scripting (Python, Java).
  • Experience in Linux/Unix system administration.
  • Experience with cloud platforms (AWS, Databricks, Azure, Snowflake).
  • Proficiency in containerization and orchestration (Docker, Kubernetes).
  • Knowledge of Infrastructure as Code (Terraform, Ansible).
  • Familiarity with monitoring and logging tools (Prometheus, Grafana).
  • Understanding of CI/CD pipelines (Jenkins, GitLab CI/CD).
  • Strong networking knowledge and troubleshooting skills.
  • Understanding of security principles and compliance.
  • Familiarity with database management (SQL and NoSQL).
  • Strong troubleshooting and debugging skills.
  • Experience in performance optimization.
  • Experience with backup and storage solutions.

Good-to-Have Skills:

  • Familiarity with the use of AI for development productivity, such as GitHub Copilot, Databricks Assistant, Amazon Q Developer, or equivalent.
  • Knowledge of Agile and DevOps practices.
  • Skills in disaster recovery planning.
  • Familiarity with load testing tools (JMeter, Gatling).
  • Basic understanding of AI/ML for monitoring.
  • Knowledge of distributed systems and microservices.
  • Data visualization skills (Tableau, Power BI).
  • Strong communication and leadership skills.
  • Understanding of compliance and auditing requirements.

Soft Skills:

  • Excellent analytical and problem-solving skills.
  • Excellent written and verbal communication skills (English) in translating technology content into business language at various levels.
  • Ability to work effectively with global, virtual teams.
  • High degree of initiative and self-motivation.
  • Ability to handle multiple priorities successfully.
  • Team-oriented, with a focus on achieving team goals.
  • Strong problem-solving and analytical skills.
  • Strong time and task leadership skills to estimate and successfully meet project timelines with the ability to bring consistency and quality assurance across various projects.

About Company

Job ID: 111857559

Similar Jobs