There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within Employee Platforms, you will solve complex and broad business problems with simple and straightforward solutions.
Job responsibilities
- Own L1/L2 production support, participate in oncall rotations, and drive rapid triage, containment, and recovery for incidents in.
- Lead postincident reviews and implement preventative actions to eliminate repeat issues and reduce operational risk.
- Define and maintain SLIs/SLOs and error budgets for critical user journeys, integrating them with change guardrails to balance velocity and reliability.
- Implement and standardize metrics, logs, and traces build actionable dashboards and alerts that improve signaltonoise.
- Tune alert policies to reduce noise and improve MTTD/MTTR, leveraging APM/AIOps to accelerate rootcause analysis.
- Build and maintain CI/CD pipelines (e.g., Jenkins, GitHub Actions, GitLab CI), manage artifact/versioning, and orchestrate environment promotions.
- Enable pre/postdeploy checks, canary/bluegreen strategies where feasible, and automated rollback to reduce change failure rate.
- Develop Pythonbased automation for selfhealing, runbook execution, health checks, and operational workflows with tests and code quality gates and practical working experience with high-availability (clusters, failover) and networking (latency, load balancing, firewall) concepts
- Be responsible for the overall Windows infrastructure and software implementation and configuration of 3rd party solutions
- Learns and applies system processes, methodologies, and skills for the development of secure, stable code and systems
- Adds to team culture of diversity, opportunity, inclusion, and respect
Required qualifications, capabilities, and skills
Preferred qualifications, capabilities, and skills
- Excellent debugging and trouble shooting skills
- Hands-on experience on Genetec Security Desk
- Emerging knowledge of software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.)