About Business Unit:
At the core of all that Epsilon does is a team that sets the foundation of our IT infrastructure. The team drives innovation and efficiency through pioneering technology across Epsilon's platforms and business verticals. From being the first point of contact for infrastructure needs to final deployment, the team provides end-to-end solutions for our client-facing platforms. ETS supports all aspects of revenue-generating platforms for Epsilon and sets the architectural direction for our enterprise deployments. By adopting the newest technologies, such as Cloud, Automation, and Artificial Intelligence, the team is at the front of redefining our digital business and capturing new opportunities.
Why we are looking for you:
- Proficient in Linux/OS administration & scripting
- Experience in Cloud/platform operations (AWS/Azure/GCP concepts)
- Knowledge of Docker & Kubernetes fundamentals
- Proficient in Infrastructure as Code (Terraform/Ansible)
- Should have Observability, SRE practices, SLI/SLO approach
- Better understanding of Networking essentials (DNS, TCP/IP, load balancing)
- Excellent problem-solving, communication, and mentoring abilities for multi-functional collaboration.
- Knowledge on
- Windows/Wintel/ Linux basics
- Automation/tooling (Python/Go/Bash, CI/CD for infra)
- Security & compliance fundamentals
- ITSM & incident leadership (ITIL, RCA, partner comms)
What you will enjoy in this role:
- Proactive in identifying service improvements and operational efficiencies.
- Collaboration with the global team and internal partners to contribute in the RCA, building the dashboard, and observability.
- Managing a Wintel/ Linux based server farm - to include the latest versions of Windows/ Linux, Microsoft SQL, and Load Balancing.
- Oversee daily IT operations, including server maintenance and patching activities.
Click here to view how Epsilon transforms marketing with 1 View, 1 Vision and 1 Voice.
Responsibilities
- Overseeing server infrastructure (physical/virtual), networking, storage, and cloud environments.
- Proactively monitor system performance and capacity, conducting troubleshooting and contributing to the root cause analysis (RCA).
- Monitor system performance, capacity, and security, ensuring maximum uptime and implementing proactive troubleshooting.
- Build reports, analyze data and communicate to the management.
- Investigates, diagnoses, and takes prescribed actions on all operational events, alarms, and incidents.
- Ensure that all infrastructure hardware and software are always kept up to date through the application of relevant software/firmware patches.
- Review the SOPs, and operational documentation.
Qualifications
- Bachelor's degree in engineering, Computer Science, IT, or equivalent discipline
- 7-10 years of related experience
- Strong verbal/written communication
- Certification on ITIL V4, MCSE, CCNA, AWS, IAT, MCSA, RHCE
- Command Centre/NOC/SOC experience
- Familiarity with application lifecycle and IT Service Management concepts.