Search by job, company or skills

A

Manager - Technology Operations Support

new job description bg glownew job description bg glownew job description bg svg
  • Posted 4 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description

At American Express, our culture is built on a 175-year history of innovation, shared and Leadership Behaviors, and an unwavering commitment to back our customers, communities, and colleagues. From delivering differentiated products to providing world-class customer service, we operate with a strong risk mindset, ensuring we continue to uphold our brand promise of trust, security, and service.

Here, your voice and ideas matter, your work makes an impact, and together, you will help us define the future of American Express.

Role Purpose

The Tech Analyst - Environment & Monitoring is a critical role within the Production Assurance Team, responsible for ensuring 24x7 system availability, stability, and operational readiness across infrastructure, application, and network environments. The role acts as the first line of defense for environment health checks, monitoring operations, and ensuring seamless readiness across UAT and Production platforms. The Customer Journey PME team is a cross-functional, collaborative and innovative team responsible for partnering with engineering and product partners to ensure alignment between the organizations and contribute to the key strategic efforts. This strategic role ensures uninterrupted service delivery, rapid incident response, and continuous improvement of operational processes, partnering closely with business and technology teams

Key Responsibilities

Environment Readiness & Management

  • Perform environment readiness checks ahead of UAT and Production cutovers, ensuring all systems are validated and deployment-ready

  • Conduct regular batch job health verification and system configuration audits to ensure compliance with operational standards

  • Support DR (Disaster Recovery) readiness activities and facilitate quick failover response as required

  • Maintain system configuration documentation and ensure audit trails are up to date

  • Ensure SOP for Environment Readiness is created and excuted with evidence of this being executed.

Monitoring & Incident Support

  • Lead 24x7 incident management, including proactive monitoring, triage, escalation, and resolution, to preserve service availability and minimize downtime

  • Execute 24x7 proactive monitoring of infrastructure, applications, batch jobs, SFTP, network, DB, firewall/security, and end-user computing metrics

  • Direct comprehensive root cause analysis and problem management, instituting robust remediation and prevention strategies for recurring operational incidents

  • Implement software development practices to build observability, alerting, tracing, automation and self-healing capabilities to maintain the highest levels of platform availability.

  • Participate in incident detection, triage, and escalation, coordinating with Shift Incharge and L1/L2/L3 support teams for timely resolution

  • Oversee implementation and execution of Disaster Recovery(DR) and business continuity plans, orchestrating readiness drills and post-event reviews with all relevant stakeholders

  • Assist in major incident management and crisis response under the guidance of the PAT Lead and Shift Incharge

  • Coordinate with vendor support teams for handshake and resolution of platform issues

  • Establish strong collaboration with business units, IT operations, compliance and vendor teams to synchronize issue resolution and foster shared ownership of production health

  • Utilize service analytics, key performance indicators (KPIs), and post-fix insights to drive ongoing process optimization, operational efficiency, and measurable improvement in service reliability

  • Lead regular operational governance forums, providing transparent reporting on incident trends, recovery status, and change outcomes for executive leadership

  • Incident Related Regulatory documentation and Communication at appropriate frequency

  • Be part of a global operations team that support a 24/7 model, willingness to work holidays and weekends.

Access, SFTP & Credential Management

  • Manage access provisioning, SFTP configuration, and credential management in accordance with security and compliance policies

  • Support periodic maintenance windows and coordinate planned downtime activities with stakeholders

Ticket Management & Reporting

  • Manage, track, and report on tickets through the IT ticketing system, ensuring SLA adherence and timely resolution

  • Contribute to operational analytics and incident trend reporting to identify recurring issues and root causes

  • Maintain shift logs, knowledge base articles, and handover documentation for continuity

Automation & Continuous Improvement

  • Identify opportunities for process automation and operational efficiency improvements within the monitoring and environment management space

  • Implement and manage productivity enhancement tools to reduce manual intervention

  • Contribute to continuous improvement initiatives under the Optimization, Productivity, and Automation workstream of PAT

Communication & Stakeholder Management

  • Communicate environment status, outages, and recovery updates to relevant stakeholders in a timely and accurate manner

  • Support regulatory and vulnerability-related communications

Additional Responsibilities

  • Hands on contribution to enterprise solutions, tooling, and initiatives leveraging your technical experience.

  • Implement shift left automated testing to prevent defects from reaching production.

  • Ensure all new critical subsystems, microservices, databases and external calls meetthe 5 9's availability requirement.

  • Conduct technical code reviews and drive innovation across the organization to adopt industry best practices.

  • Review all significant functionality changes and peer review critical production hotfixes.

Expected Impact

  • Achieve highest levels of production stability and service resilience

  • Deliver efficient incident and problem management, reducing repeat issues and enhancing end-user confidence in technology services

  • Maintain disciplined control of production changes, safeguarding business operations from risk and regulatory exposure

  • Enable robust DR preparedness and execution

  • Provide actionable insights, intelligence, and decision support for executive stakeholders, enabling data-driven investment in production assurance

Required Skills & Experience

  • 10-12 years in IT operations, environment management, or infrastructure monitoring roles

  • Proficiency in monitoring tools (e.g., Dynatrace, AppDynamics) experience with SFTP, batch job monitoring, and network/DB health checks

  • Hands-on experience with ticketing platforms (ServiceNow, JIRA, or equivalent)

  • Working knowledge of cloud infrastructure (AWS/Azure), server environments, firewalls, and network security fundamentals

  • Exposure to scripting (Shell, Python, or PowerShell) for operational automation is preferred

  • Strong communication, analytical thinking, and ability to work in a 24x7 shift-based environment

We back you with benefits that support your holistic well-being so you can be and deliver your best. This means caring for you and your loved ones physical, financial, and mental health, as well as providing the flexibility you need to thrive personally and professionally:

  • Competitive base salaries

  • Bonus incentives

  • Support for financial-well-being and retirement

  • Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location)

  • Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need

  • Generous paid parental leave policies (depending on your location)

  • Free access to global on-site wellness centers staffed with nurses and doctors (depending on location)

  • Free and confidential counseling support through our Healthy Minds program

  • Career development and training opportunities

American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law.

Offer of employment with American Express is conditioned upon the successful completion of a background verification check, subject to applicable laws and regulations.

More Info

About Company

American Express Company (Amex) is an American multinational corporation specialized in payment card services headquartered at 200 Vesey Street in the Battery Park City neighborhood of Lower Manhattan in New York City. The company was founded in 1850 and is one of the 30 components of the Dow Jones Industrial Average. The company's logo, adopted in 1958, is a gladiator or centurion whose image appears on the company's well-known traveler's cheques, charge cards, and credit cards.

Job ID: 145409247

Similar Jobs