Search by job, company or skills

codechavo

Application Support Engineer (L1/L2) – Production Support

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 22 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Company Description

CodeChavo is a trusted leader in IT Staffing and Services, catering to the staffing and solutions needs of renowned brands across India and the US. The company is committed to delivering high-quality technology solutions and connecting businesses with top-tier IT professionals. Known for its innovative approach and expertise, CodeChavo thrives on fostering successful partnerships with its clients and employees.

Role Description

Location: Mumbai, India

Work-mode: Hybrid Experience: 4-8 years

Important Requirement (Please Read Before Applying) This role requires:

● Willingness to work in rotational shifts (IST & EST time zones)

● Availability to work on weekends (mandatory as per shift schedule) Please apply only if you

are comfortable with the above requirements.

About the Role:

We are looking for an Application Support Engineer (L1/L2) to ensure the stability, reliability,

and smooth functioning of our production systems. This role acts as the first line of defense for

system monitoring and incident response, ensuring that issues are identified early, resolved

quickly, and escalated appropriately. The ideal candidate should be comfortable working in a

high-availability, fast-paced environment, handling alerts, monitoring data pipelines, and

ensuring seamless platform operations.

Key Responsibilities:

Monitoring & System Health

● Monitor production systems using tools such as Datadog, CloudWatch, and internal

dashboards

● Track system health across APIs, data pipelines, databases, and third-party integration

● Identify anomalies and validate alerts to reduce false positives

Incident Management & Response

● Respond to system alerts in real-time (failures, latency spikes, downtime)

● Perform initial incident triage and identify impacted components

● Execute predefined runbooks and recovery actions (job restarts, retries, etc.)

● Escalate issues to engineering teams when required

Data Pipeline Monitoring

● Monitor scheduled jobs and workflows (e.g., Dagster, SageMaker, batch pipelines)

● Identify missing, delayed, or failed data processes

● Trigger re-runs or escalate issues to relevant teams

Third-Party & Vendor Monitoring

● Monitor failures in external APIs, proxies, and vendor systems

● Coordinate with internal teams for resolution ● Track and highlight recurring vendor-related

issues

Database Monitoring

● Perform basic database health checks including:○ Connection issues

○ Slow queries

○ Replication lag

○ Storage utilization

● Raise alerts for any anomalies

Runbook Execution & Documentation

● Follow standard operating procedures and runbooks for known issues

● Maintain clear logs of actions taken during incidents

● Ensure proper closure and documentation of incidents

Reporting & Shift Handover

● Maintain incident logs and reports

● Provide structured shift handovers to ensure continuity

● Highlight recurring issues and patterns for further analysis

What You Will NOT Be Responsible For (To set the right expectations clearly)

● No deep debugging or code-level fixes

● No infrastructure changes

● No ownership of alert configurations (handled by SRE/Engineering teams)

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 145830275

Similar Jobs