Search by job, company or skills

CLSA

Lead Support Analyst - Shared Services and Production Management , Information Technology

8-10 Years
Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Key Areas of Responsibilities

  • Own and support monitoring and SRE operations, ensuring system reliability, availability, and performance.
  • Build, enhance, and maintain monitoring solutions using ITRS Geneos, Prometheus, Victoria‑Metrics, Elasticsearch, and Grafana.
  • Develop, optimize, and maintain alerting rules, dashboards, and observability pipelines.
  • Troubleshoot and resolve complex issues during major incidents, providing clear and timely communication.
  • Troubleshoot Linux servers (RHEL 7/8/9), including upgrades, configurations, patching, and maintenance, while determining appropriate monitoring requirements for system changes.
  • Analyze logs, investigate issues, and perform fault finding to identify performance exceptions.
  • Collaborate with engineering, application, and infrastructure teams to improve system resilience, stability, security, efficiency, and scalability.
  • Contribute to automation strategies, deployment processes, and continuous operational improvements.
  • Participate in on‑call rotations, including off‑hours and scheduled weekend support.
  • Participate in Disaster Recovery (DR) and Business Continuity Planning (BCP) drills.
  • Continuously research and adopt modern monitoring and SRE tools and practices.

Requirements

  • Bachelor's degree in computer science / engineering
  • Minimum 8 years experience within IT / Investment bank.
  • Strong experience with monitoring and observability platforms, including: ITRS Geneos, Prometheus, Victoria‑Metrics, Elasticsearch, Grafana, and Kibana.
  • Hands-on experience building and implementing Prometheus pipelines, including exporters, scraping configurations, relabelling, metric routing, and integrations with long‑term storage (e.g., Victoria‑Metrics).
  • Experience building and maintaining Logstash pipelines, including ingestion, parsing, filtering, enrichment, and routing of logs into Elasticsearch.
  • Ability to design, build, and maintain Grafana and Kibana dashboards for metrics, logs, and performance analytics across distributed systems.
  • Solid understanding of metrics, logging, alerting, dashboards, and observability pipelines.
  • Strong Linux administration skills (RHEL 7/8/9), including troubleshooting, upgrades, configuration, patching, and performance optimization.
  • Good understanding of SRE principles, high availability, scalability, incident management and DR (Disaster Recovery) / BCP (Business Continuity Planning) activities
  • Experience with automation (e.g., Bash, Python, Ansible, CI/CD tools) is an advantage.
  • Understanding of networking fundamentals, performance tuning, and troubleshooting distributed systems.
  • Prior experience in Production Support, SRE, Monitoring Engineering, or Shared Services Operations with participation in on‑call rotations, including after-hours and weekend support.
  • Strong analytical, problem‑solving and communication skills with the ability to work collaboratively under pressure.
  • Self-motivated, adaptable and able to prioritize, learn continuously and manage multiple responsibilities effectively.
  • Excellent/Fluent in English

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 146979213

Similar Jobs

Early Applicant
Early Applicant

Project Manager-IT Software

**********Company Name Confidential
Early Applicant