Search by job, company or skills

Electronikmedia (EM)

Level 3 AWS Infrastructure Support Engineer

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 months ago

Job Description

Role Overview

As a Level 3 AWS Infrastructure Support Engineer, you will own overnight monitoring and response for Electronikmedia's Clients AWS-based production environment. You will:

  • Monitor system health using Datadog and AWS-native tools
  • Investigate alerts and anomalies using established runbooks
  • Resolve production incidents when possible
  • Escalate complex issues quickly and accurately
  • Maintain clean, auditable incident documentation

This role is ideal for someone who thrives in high-trust, high-impact operational environments.

Key ResponsibilitiesOn-Call & Incident Response
  • Provide initial response within 15 minutes for all high-priority production alerts
  • Investigate, mitigate, and resolve production outages when feasible
  • Escalate unresolved or complex issues using the defined escalation matrix
  • Act as theowner of the production system stability
Monitoring, Alerting & Observability
  • Analyze and respond to Datadog monitor alerts across infrastructure and application layers
  • Identify abnormal patterns, trend-line deviations, and early indicators of systemic risk
  • Proactively notify stakeholders of significant performance or stability concerns
  • Contribute insights for preventive and corrective actions
Root Cause & Trend Analysis
  • Track recurring alerts and incidents
  • Provide analysis and recommendations to reduce alert noise and improve system resilience
  • Participate in weekly validation of Datadog alert configurations and thresholds
Communication & Documentation
  • Maintain clear, concise, and timely communication during incidents
  • Document all incidents, alarms, and observations in Jira during each shift
  • Ensure handoff notes are complete and actionable for daytime engineering teams
Technical EnvironmentCore AWS Services
  • ECS (Fargate)
  • RDS
  • ElastiCache
  • EC2
  • Lambda
  • API Gateway
  • S3
Tooling
  • Datadog (monitoring, alerts, dashboards)
  • Jira (incident tracking and documentation)
QualificationsExperience
  • 5+ years of hands-on AWS infrastructure administration and support
  • Proven experience supporting production-grade, high-availability systems
  • Strong background in incident response within enterprise or scale-up environments
Skills
  • Deep operational knowledge of AWS services and distributed systems
  • Strong troubleshooting and root-cause analysis skills under tight SLAs
  • Ability to follow runbooks while also knowing when to think beyond them
  • Calm, structured decision-making during production incidents
Certifications (Preferred)
  • AWS Certified Solutions Architect Associate or Professional
  • AWS Certified DevOps Engineer Professional (Nice to Have)
Service Level Expectations
  • Alert Escalation SLA: 15 minutes for high-priority alarms
  • Availability: Consistent overnight coverage ( IST Day Shift )
  • Reliability: Zero missed critical alerts during assigned coverage windows
Deliverables
  • Monthly Service Performance Report, including:
  • Alerts monitored
  • Incidents resolved
  • Escalations
  • SLA adherence metrics
  • Weekly Datadog Validation, ensuring alert accuracy and functionality

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 137124111