
Search by job, company or skills
Title - Senior Support Engineer (Transitioning to SRE)
Location - Chennai
Job type - Permanent, Full-time
Work-Mode - 5 days work from office (UK & Non US timing)
Experience - 4 to 10 Years
Immediate Joiners only
Role Overview
We are looking for a Senior Support Engineer who will play a critical role in transforming our traditional support and monitoring teams into a modern Site Reliability Engineering (SRE) function. This role combines Level 1 monitoring responsibilities and Level 2 support duties, ensuring end-to-end accountability for system reliability. The ideal candidate will have strong technical troubleshooting skills, experience with operational support, and a mindset for automation and proactive problem-solving.
Key Responsibilities
Act as the first point of contact for system alerts and proactively monitor on-prem and cloud environments.
Triage alerts, resolve issues using playbooks, and escalate when necessary.
Own incidents end-to-end, ensuring timely resolution and communication.
Troubleshoot and resolve complex issues such as incomplete file processing, manual data loads, and system alerts.
Handle customer support issues related to file processing and integrations.
Implement automation for recurring issues and manual interventions.
Integrate proactive monitoring and self-healing mechanisms into systems.
Drive root cause analysis and implement permanent fixes to prevent recurrence.
Apply SRE principles and AWS Well-Architected Framework best practices for reliability, scalability, and cost optimization.
Identify gaps in current processes and propose improvements.
Work closely with development teams to ensure reliability and operability of new features.
Participate in on-call rotations and incident reviews to improve system resilience.
Required Skills & Qualifications
Minimum 4 years in application support, operations, or reliability engineering.
Strong troubleshooting skills across on-prem systems and cloud environments.
Familiarity with Java, MySQL, and Python for debugging and support.
Experience with monitoring tools (e.g., Nagios, Prometheus, CloudWatch) and alert management.
Experience with automation scripting (Python, Shell, or similar).
Knowledge of incident management frameworks and ITIL processes.
Understanding of cloud platforms (AWS preferred) and migration considerations.
Preferred: Exposure to SRE principles and practices.
Experience with CI/CD pipelines and DevOps tools.
Knowledge of observability concepts (metrics, logs, traces).
Soft Skills
Strong communication and collaboration skills.
Ability to work under pressure and manage critical incidents.
Analytical mindset with a focus on continuous improvement.
Job ID: 141448557