Search by job, company or skills

Jpmorgan & Co

Site Reliability Engineer II – AWS, Incident Response, Automation, Observability

This job is no longer accepting applications

new job description bg glownew job description bg glow
  • Posted 14 days ago

Job Description

Job Description

Join a team where your SRE expertise drives critical application reliability and operational excellence. Grow your skills in a collaborative, innovative environment.

As a Site Reliability Engineer at JPMorgan Chase within the Chief technology Office team, you will manage and optimize production operations for critical applications. You will leverage your AWS and SRE skills to ensure service stability, performance, and resilience. You will collaborate with engineering and security teams to deliver secure, reliable solutions. Your contributions will help us maintain a robust and thriving operating environment.

Job Responsibilities

  • Manage and support production operations for critical applications, ensuring stability and predictable performance
  • Proactively monitor health signals, identify risks, and prevent incidents
  • Execute operational routines including release readiness, change coordination, and controlled rollouts
  • Lead or participate in incident triage, recovery, communications, and post-incident reviews with clear root cause analysis and follow-up actions
  • Drive problem management to eliminate repeat incidents
  • Build and maintain dashboards, alerts, and operational documentation for improved detection and diagnosis
  • Automate manual operational tasks and improve tooling using scripting or coding (Python, Bash, Go)
  • Define and track SLIs/SLOs, manage error budgets, and partner with development teams for reliability
  • Perform capacity planning, resilience testing, and performance tuning

Required Qualifications, Capabilities And Skills

  • Formal training or certification on security engineering concepts and 5+ years applied experience
  • Experience supporting critical application production environments with strong operational discipline
  • Strong troubleshooting skills across Linux, application behavior, and networking fundamentals
  • Hands-on experience operating and diagnosing issues in AWS environments
  • Solid working knowledge of AWS IAM and access control best practices
  • Experience with observability tools (monitoring, logging, alerting)
  • Automation mindset with scripting/coding capability (Python, Bash, Go) and familiarity with CI/CD practices
  • Clear communication during incidents and strong documentation habits

Preferred Qualifications, Capabilities And Skills

  • Experience with tracing tools for observability
  • Familiarity with resilience testing and performance tuning in cloud environments
  • Knowledge of operational security requirements and credential hygiene
  • Experience collaborating with platform and engineering teams

ABOUT US

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147611865

Similar Jobs

Hyderabad

Skills:

AWSBashPythonLoggingLinuxMonitoringalertingobservability toolsSREGo

Hyderabad, India

Skills:

Rest AssuredJavaDefect TrackingTest ManagementSqlApi Testingautomated test scriptsTestNGJUnitSeleniumPostmanPythoncontinuous integration tools

Hyderabad, India

Skills:

MlJavaTeamcityBashAvroSqlJenkinsShellLinuxApi TestingDatabricksPythonGenAIParquetdelta lakeAi

Hyderabad, India

Skills:

MlJavaTeamcityBashAvroSqlJenkinsShellLinuxApi TestingDatabricksPythonGenAIParquetdelta lakeAi

Remote, India

Skills:

Security ControlsPowerShellOperating SystemsNetwork ProtocolsPythonforensic toolsetsDefender for EndpointAzure SentinelMicrosoft Sentinelnetwork forensicscloud environments