Search by job, company or skills

Kresta Softech Private Limited

Senior Support Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 4 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Title - Senior Support Engineer (Transitioning to SRE)

Location - Chennai

Job type - Permanent, Full-time

Work-Mode - 5 days work from office (UK & Non US timing)

Experience - 4 to 10 Years

Immediate Joiners only

Role Overview

We are looking for a Senior Support Engineer who will play a critical role in transforming our traditional support and monitoring teams into a modern Site Reliability Engineering (SRE) function. This role combines Level 1 monitoring responsibilities and Level 2 support duties, ensuring end-to-end accountability for system reliability. The ideal candidate will have strong technical troubleshooting skills, experience with operational support, and a mindset for automation and proactive problem-solving.

Key Responsibilities

Act as the first point of contact for system alerts and proactively monitor on-prem and cloud environments.

Triage alerts, resolve issues using playbooks, and escalate when necessary.

Own incidents end-to-end, ensuring timely resolution and communication.

Troubleshoot and resolve complex issues such as incomplete file processing, manual data loads, and system alerts.

Handle customer support issues related to file processing and integrations.

Implement automation for recurring issues and manual interventions.

Integrate proactive monitoring and self-healing mechanisms into systems.

Drive root cause analysis and implement permanent fixes to prevent recurrence.

Apply SRE principles and AWS Well-Architected Framework best practices for reliability, scalability, and cost optimization.

Identify gaps in current processes and propose improvements.

Work closely with development teams to ensure reliability and operability of new features.

Participate in on-call rotations and incident reviews to improve system resilience.

Required Skills & Qualifications

Minimum 4 years in application support, operations, or reliability engineering.

Strong troubleshooting skills across on-prem systems and cloud environments.

Familiarity with Java, MySQL, and Python for debugging and support.

Experience with monitoring tools (e.g., Nagios, Prometheus, CloudWatch) and alert management.

Experience with automation scripting (Python, Shell, or similar).

Knowledge of incident management frameworks and ITIL processes.

Understanding of cloud platforms (AWS preferred) and migration considerations.

Preferred: Exposure to SRE principles and practices.

Experience with CI/CD pipelines and DevOps tools.

Knowledge of observability concepts (metrics, logs, traces).

Soft Skills

Strong communication and collaboration skills.

Ability to work under pressure and manage critical incidents.

Analytical mindset with a focus on continuous improvement.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 141448557

Similar Jobs