Search by job, company or skills

T

Associate Director Application Support Engineering

new job description bg glownew job description bg glownew job description bg svg
  • Posted 10 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

JOB DESCRIPTION

Are you ready to make an impact at DTCC

Do you want to work on innovative projects, collaborate with a dynamic and supportive team, and receive investment in your professional development At DTCC, we are at the forefront of innovation in the financial markets. We are committed to helping our employees grow and succeed. We believe that you have the skills and drive to make a real impact. We foster a thriving internal community and are committed to creating a workplace that looks like the world that we serve.

The Information Technology group delivers secure, reliable technology solutions that enable DTCC to be the trusted infrastructure of the global capital markets. The team delivers high-quality information through activities that include development of essential, building infrastructure capabilities to meet client needs and implementing data standards and governance.

Pay and Benefits:

  • Competitive compensation, including base pay and annual incentive
  • Comprehensive health and life insurance and well-being benefits, based on location
  • Pension / Retirement benefits
  • Paid Time Off and Personal/Family Care, and other leaves of absence when needed to support your physical, financial, and emotional well-being.
  • DTCC offers a flexible/hybrid model of 3 days onsite and 2 days remote (onsite Tuesdays, Wednesdays and a third day unique to each team or employee).

The Impact you will have in this role:

The Enterprise Application Support (EAS) team is responsible for providing technical application support for ITP and ECS lines of business. Within EAS, the Associate Director Application Support Engineering / SRE Lead (Site Reliability Engineer Lead) is a senior technical role responsible for driving the overall reliability, scalability, and performance of critical systems by implementing standard methodologies, participating in incident response, automating processes, and collaborating with development teams to ensure system stability and uptime across the organization, often acting as a technical partner in promoting a strong SRE culture within the company key responsibilities include designing monitoring systems, capacity planning, and actively identifying and mitigating potential issues before they impact users. The SRE team works closely with development teams, infrastructure and network partners, security partners, Scrum Masters, and internal / external clients to improve observability, operational supportability, resiliency, and mean time to restore service through driving improvements to support capabilities.

Your Primary Responsibilities:

  • Scrum Participation: Join all project collaborators planning and design sessions, sprint zero and stand-ups for all new delivery, to champion NFRs reflective of a strong observability and resiliency traits.
  • System Reliability Architecture: Drive Design and help implement reliable, resilient, and scalable systems, considering redundancy, fault tolerance, and disaster recovery strategies. Make design recommendations that will allow the application to recover without cleanup activities or create a recovery runbook for application support team to follow for improved application recovery times.
  • Monitoring and Alerting: Develop comprehensive monitoring systems to identify potential issues proactively, define actionable alerts, and establish SLIs (Service Level Indicators) and SLOs (Service Level Objectives).
  • Incident Management: Lead incident response during critical system outages, facilitating timely problem diagnosis and resolution, conducting post-mortem analysis to identify root causes and prevent future occurrences.
  • Automation and Tooling: Develop and maintain automation scripts to streamline operational tasks, including self-healing, application deployments, scaling, and infrastructure management.
  • Collaboration with Development Teams: Work closely with development teams to integrate SRE practices into the software development lifecycle, promoting code quality, reliability, and observability.
  • Security Integration: Collaborate with security teams to ensure system resilience against cyber threats, implementing security best practices and supervising for vulnerabilities.
  • Technical Expertise: Stay updated on emerging technologies and industry trends related to cloud computing, distributed systems, and reliability engineering.
  • Operational Readiness: Attend and present operational readiness with application support (EAS L2) at each project management meeting - raise any operational risks and concerns. Test NFRs in UAT environments to validate effectiveness and completeness of operational capabilities.
  • Risk Management: Partner with IT Embedded Risk Managers to identify strategic solutions for risk incidents.
  • Metrics and Reporting: Demonstrate operational improvements through defined KPIs.
  • Capacity Planning: Proactively assess system capacity needs, plan for future growth, and implement scaling strategies to ensure optimal performance under high load.
  • Performance Optimization: Analyze system performance metrics to identify bottlenecks and implement optimization strategies to improve system responsiveness and efficiency.

Qualifications:

  • Minimum of 8 years of related experience
  • Bachelor's degree preferred or equivalent experience

Talents Needed for Success:

  • Strong Programming Skills: Proficiency in one or more programming languages like Python, Java, Go, etc., for automation and development of monitoring tools.
  • System Administration:Expertise in Linux/Unix operating systems, network administration, and cloud platforms (AWS, Azure, GCP). Mainframe experience is a plus.
  • Monitoring and Observability: Deep understanding of monitoring tools (Splunk, Dynatrace, ITSI, etc.) and experience in designing robust monitoring systems.
  • Incident Management: Proven track record to participate in incident response teams under pressure, effectively solving complex issues.

ABOUT THE TEAM

To maintain strong alignment between IT and the business, we are bringing together all Solutions-focused teams under a unified technology organization, IT Solutions. The newly-formed IT Solutions department combines Application Development and Enterprise Application Support functions, allowing us to leverage synergies to support the Solutions business lines.

More Info

About Company

The Depository Trust & Clearing Corporation is an American post-trade financial services company providing clearing and settlement services to the financial markets.

Job ID: 142498935