Automation Engineer, Systems Operations

ICE

Hyderabad, India

3-5 Years

Save

Posted 17 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Description

Job Purpose

The Automation Engineer is responsible for designing, building, and operating automation solutions that ensure the reliability, scalability, and operational efficiency of ICE/NYSE platforms. This role blends hands-on systems operations with software-driven automation, focusing on eliminating manual, repetitive tasks and improving how production systems are monitored, supported, and maintained.

Acting as both an engineer and an operator, the Automation Engineer partners closely with developers, infrastructure teams, and IT operations to automate operational workflows, enhance observability, and provide real-time production support. By proactively automating routine operational processes and reducing human intervention, this role directly contributes to minimizing downtime, improving system resilience, and strengthening the overall architecture of ICE/NYSE exchanges, divisions, and infrastructure.

This is a 24x7 production environment. The position is five days per week in-office and may require participation in off-hours support, shift rotations, or weekend work as needed to meet business and operational requirements.

Responsibilities

Automation

Identify automation opportunities to assist with Disaster Recovery and incident remediation.
Contribute on automation projects including scripting and building of automation jobs.
Collaborate with other team members (internal and external) on automation initiatives.
Investigate and troubleshoot issues with existing automation.

Incident Management

Monitor systems and applications within the production environment.
Diagnose and fix incidents raised through monitoring tools, conference bridges, and chats.
Work with and escalate to internal and external teams to implement incident fixes, workarounds, and data recovery.
Open and update production incident tickets according to company standards.

Problem Management

Investigate and update incident tickets with root cause and incident description, ensuring appropriate corrective action follow-up tickets are assigned.
Manage incident tickets to closure, ensuring incident details are complete and accurate, and all corrective actions have been completed.
Participate in continuous improvement programs, such as trend analysis of recurring issues.
Provide and report on performance metrics of the environment.

System and Application Production Readiness

Work with internal and external teams to expand and maintain operational runbooks and other documentation.
Check application and infrastructure availability and tasks at scheduled times.
Configure monitoring tools and alarms.

Change Management

Ensure successful prioritization, approval, scheduling, and execution of production and DR environment changes.
Approve and execute production deployment tasks.

Disaster Recovery Management

Participate in disaster recovery, business continuity, and workplace recovery events.

Knowledge And Experience

3+ Years of cumulative, full-time experience.
BS in Computer Science, Computer Engineering, Math, or equivalent professional experience
Proficiency in scripting languages (e.g., Python, PowerShell, Shell scripting).
Understanding of network protocols and security concepts.
Strong knowledge of operating systems (Windows, Linux, macOS).
Strong problem-solving and analytical skills.
Excellent communication skills (both written and verbal).
Demonstrated ability to communicate technical issues clearly to non-technical stakeholders and collaborate across cross-functional teams.

Specific Technologies

Rundeck, Pagerduty, BigPanda, Ansible, n8n, Splunk, Github Actions, Jenkins