Search by job, company or skills

Coupang

Staff System Engineer

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 months ago

Job Description

Company Introduction

We exist to wow our customers. We know we're doing the right thing when we hear our customers say, How did we ever live without Coupang Born out of an obsession to make shopping, eating, and living easier than ever, we are collectively disrupting the multi-billion-dollar commerce industry from the ground up and establishing an unparalleled reputation for being leading and reliable force in South Korean commerce.

We are proud to have the best of both worlds a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been at since our inception. We are all entrepreneurial surrounded by opportunities to drive new initiatives and innovations. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company grow every day.

Our mission to build the future of commerce is real. We push the boundaries of what's possible to solve problems and break traditional tradeoffs. Join Coupang now to create an epic experience in this always-on, high-tech, and hyper-connected world.

Role Overview:

The ICT Reliability Engineering team is dedicated to maintaining the continuity and stability of Coupang's enterprise IT services. The team operates and continuously improves monitoring systems for both IT infrastructure and applications, ensuring high visibility and rapid incident detection. In the event of service disruptions, the team collaborates closely with engineering and operations teams to resolve issues efficiently and manage key performance metrics. Additionally, the team leads regular disaster recovery (DR) tests to validate system resilience and ensure business continuity.

Key Responsibilities:

  • Identify operational inefficiencies and automation opportunities within monitoring workflows and infrastructure.
  • Design and implement automated solutions for deployment, configuration, and scaling of monitoring tools using Infrastructure-as-Code (IaC) technologies such as Terraform, Ansible, Puppet, or similar.
  • Leverage REST APIs of platforms like Zabbix, SolarWinds, Prometheus, and Grafana to streamline and standardize monitoring setup and management.
  • Develop reusable automation assetsscripts, templates, and modulesto ensure consistent monitoring practices across diverse environments.
  • Automate Grafana dashboard creation and management, including templating, data source integration, and role-based access control.
  • Integrate monitoring systems with alerting, ticketing, and reporting platforms to enable seamless incident management and visibility.
  • Establish tagging strategies and observability standards to ensure uniform data collection and traceability across services.
  • Support incident response by building automated diagnostics and enriching telemetry data for faster root cause analysis.
  • Collaborate cross-functionally with DevOps and SRE teams to align monitoring automation with CI/CD pipelines and operational goals.

Tech Skills:

Infrastructure as Code (IaC) & Automation

  • Terraform
  • Ansible
  • Puppet
  • Scripting languages: Python, Bash, PowerShell, SSH

Monitoring & Observability Tools

  • Zabbix
  • SolarWinds
  • Prometheus
  • Grafana (including dashboard templating, provisioning, and API-based automation)
  • Datadog or Dynatrace (as alternatives or complementary tools)

API Integration & Automation

  • Experience working with REST APIs for automation and integration
  • Familiarity with JSON, YAML, and HTTP methods (GET, POST, PUT,
  • DELETE)

CI/CD & DevOps Tooling

  • Jenkins, GitLab CI, GitHub Actions, or similar
  • Docker and Kubernetes (for containerized environments)

Alerting & Incident Management Integration

  • ServiceNow, Jira, VictorOps, xMatters, or similar
  • Knowledge of event correlation and automated diagnostics

Cloud Platforms (optional)

  • AWS, Azure, or Google Cloud Platform
  • Cloud-native monitoring tools like CloudWatch, Azure Monitor, or GCP Operations Suite

Preferred Qualifications:

  • Soft Skills & Operational Mindset
  • Strong problem-solving and gap analysis capabilities
  • Ability to identify low-hanging fruits for automation
  • Experience in cross-functional collaboration (DevOps, SRE, IT Ops)
  • Understanding of observability principles and tagging strategies

Type of work model: Hybrid

Coupang hybrid work model is designed to enable a culture of collaboration that acts a catalyst to enrich the experience of employees. Employees are required to work at least 3 days in the office per week, with the flexibility to work from home 2 days a week, depending on the role requirement. Some businesses may require more time in office due to nature of work.

Privacy Notice

Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice located below: https://www.coupang.jobs/privacy-policy/

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Coupang

Job ID: 126909137