Search by job, company or skills

G

Senior Site Reliability Engineer

1-6 Years
Save
  • Posted a month ago
  • Over 200 applicants
Quick Apply

Job Description

Key Responsibilities:

  • Lead reliability engineering projects from design to execution, ensuring alignment with business objectives.
  • Ensure system stability, performance, and high availability by proactively monitoring and troubleshooting production issues.
  • Design, build, and maintain scalable, efficient, and reliable cloud-based infrastructure and services.
  • Automate manual processes to improve platform observability, reduce operational toil, and enhance reliability.
  • Implement and manage observability solutions using tools such as Grafana, Splunk, and Dynatrace for comprehensive monitoring, alerting, and logging.
  • Own end-to-end availability, performance, and scalability of critical services and internal tools.
  • Apply and manage SLI, SLO, SLA, and Error Budget frameworks to maintain service reliability.
  • Provide on-call support and lead incident management and response activities.
  • Conduct blameless postmortems to identify root causes and ensure preventive measures.
  • Collaborate with development and infrastructure teams to integrate reliability best practices into design and deployment processes.
  • Manage and maintain internal tools and infrastructure used by other development teams.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

GreyOrange provides retailers, warehouse operators and third-party logistics providers (3PLs) around the world with automated robotic fulfillment and inventory optimization solutions. We help our customers increase productivity, mitigate labor challenges and reduce risk all while enabling better experiences for their customers and employees.

Job ID: 130424763

Similar Jobs

Delhi, Kolkata, Mumbai

Skills:

AgileSoftware Development Life CycleJavascriptSplunkAutomationJIRAPythonProduct managementOperationsMonitoring

Gurugram, India

Skills:

Distributed SystemsNetworkingPrometheusBashGrafanaTerraformLinuxAzurePythonKubernetesAWSInfrastructure as CodeGo

Delhi, India

Skills:

Distributed SystemsNetworkingPrometheusBashGrafanaLinuxTerraformAzureKubernetesPythonAWSInfrastructure as CodeGo

Gurugram, Gurugram, India

Skills:

ElkUnix AdministrationNetworkingPrometheusDnsGrafanaDockerTerraformPythonAWSLoad BalancingDebuggingLog AnalysisBashAutomationNew RelicJenkinsMonitoring ToolsLinuxDistributed SystemsKubernetesInfrastructure as CodeGitHub ActionsCloud infrastructure managementContainer orchestrationGitLab CIArgoCD

Noida, India

Skills:

RustGcpTerraformPythonKubernetesSecurity baselineGoFinOps mindsetReliabilityGPU workload understandingObservability