IT Infrastructure Support Site Reliability Engineer

6-11 Years

Save

Early Applicant

Quick Apply

Job Description

Key Responsibilities

Service Reliability & Automation

Establish, monitor, and enforce Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for infrastructure tooling, including configuration compliance, patch success rates, and deployment latency
Provide Level 3 expertise for tooling-specific incidents, focusing on automating incident remediation and reducing MTTR
Automate repetitive tasks across managed infrastructure to measurably reduce operational overhead (e.g., server build time reductions)
Conduct root cause analysis and lead blameless postmortems for service-impacting incidents to drive systemic improvements

Infrastructure & Configuration Management

Engineer and maintain automated scripts for asset management, configuration databases, and monitoring systems
Design, develop, and deploy full-stack applications, custom plugins, and automation scripts for direct device interaction
Maintain Infrastructure-as-Code (IaC) configurations for Windows and Linux servers using tools such as Ansible, Terraform, or Puppet
Implement drift detection and auto-remediation capabilities for configuration compliance

Network & Security Device Automation

Build API-driven tools for network configuration, firmware updates, pre/post-change validation, and real-time health monitoring
Deploy monitoring agents, centralized logging, and dashboards with alerts based on critical SLIs (latency, error rates, traffic, saturation)
Develop automation scripts for intelligent ticket handling, validation, and escalation workflows within enterprise ticketing systems

Monitoring & Continuous Improvement

Implement and manage monitoring solutions (Prometheus, Grafana, Datadog) and centralized logging platforms (ELK Stack)
Build custom dashboards, alerts, and reporting for infrastructure and security devices
Participate in continuous improvement initiatives to enhance automation, tooling reliability, and system resilience