Staff System Engineer

Coupang

Bengaluru, India

Fresher

Save

Posted 5 months ago
Be among the first 10 applicants

Early Applicant

Job Description

Company Introduction

We exist to wow our customers. We know we're doing the right thing when we hear our customers say, How did we ever live without Coupang Born out of an obsession to make shopping, eating, and living easier than ever, we are collectively disrupting the multi-billion-dollar commerce industry from the ground up and establishing an unparalleled reputation for being leading and reliable force in South Korean commerce.

We are proud to have the best of both worlds a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been at since our inception. We are all entrepreneurial surrounded by opportunities to drive new initiatives and innovations. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company grow every day.

Our mission to build the future of commerce is real. We push the boundaries of what's possible to solve problems and break traditional tradeoffs. Join Coupang now to create an epic experience in this always-on, high-tech, and hyper-connected world.

Role Overview:

The ICT Reliability Engineering team is dedicated to maintaining the continuity and stability of Coupang's enterprise IT services. The team operates and continuously improves monitoring systems for both IT infrastructure and applications, ensuring high visibility and rapid incident detection. In the event of service disruptions, the team collaborates closely with engineering and operations teams to resolve issues efficiently and manage key performance metrics. Additionally, the team leads regular disaster recovery (DR) tests to validate system resilience and ensure business continuity.

Key Responsibilities:

Identify operational inefficiencies and automation opportunities within monitoring workflows and infrastructure.
Design and implement automated solutions for deployment, configuration, and scaling of monitoring tools using Infrastructure-as-Code (IaC) technologies such as Terraform, Ansible, Puppet, or similar.
Leverage REST APIs of platforms like Zabbix, SolarWinds, Prometheus, and Grafana to streamline and standardize monitoring setup and management.
Develop reusable automation assetsscripts, templates, and modulesto ensure consistent monitoring practices across diverse environments.
Automate Grafana dashboard creation and management, including templating, data source integration, and role-based access control.
Integrate monitoring systems with alerting, ticketing, and reporting platforms to enable seamless incident management and visibility.
Establish tagging strategies and observability standards to ensure uniform data collection and traceability across services.
Support incident response by building automated diagnostics and enriching telemetry data for faster root cause analysis.
Collaborate cross-functionally with DevOps and SRE teams to align monitoring automation with CI/CD pipelines and operational goals.

Tech Skills:

Infrastructure as Code (IaC) & Automation

Terraform
Ansible
Puppet
Scripting languages: Python, Bash, PowerShell, SSH

Monitoring & Observability Tools

Zabbix
SolarWinds
Prometheus
Grafana (including dashboard templating, provisioning, and API-based automation)
Datadog or Dynatrace (as alternatives or complementary tools)

API Integration & Automation

Experience working with REST APIs for automation and integration
Familiarity with JSON, YAML, and HTTP methods (GET, POST, PUT,
DELETE)

CI/CD & DevOps Tooling

Jenkins, GitLab CI, GitHub Actions, or similar
Docker and Kubernetes (for containerized environments)

Alerting & Incident Management Integration

ServiceNow, Jira, VictorOps, xMatters, or similar
Knowledge of event correlation and automated diagnostics

Cloud Platforms (optional)

AWS, Azure, or Google Cloud Platform
Cloud-native monitoring tools like CloudWatch, Azure Monitor, or GCP Operations Suite

Preferred Qualifications:

Soft Skills & Operational Mindset
Strong problem-solving and gap analysis capabilities
Ability to identify low-hanging fruits for automation
Experience in cross-functional collaboration (DevOps, SRE, IT Ops)
Understanding of observability principles and tagging strategies

Type of work model: Hybrid

Coupang hybrid work model is designed to enable a culture of collaboration that acts a catalyst to enrich the experience of employees. Employees are required to work at least 3 days in the office per week, with the flexibility to work from home 2 days a week, depending on the role requirement. Some businesses may require more time in office due to nature of work.

Privacy Notice

Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice located below: https://www.coupang.jobs/privacy-policy/