Site Reliability Engineer

creospan private limited

Fresher

Pune, India

This job is no longer accepting applications

Job Description

Site Reliability Engineer

Pune - Kharadi (Hybrid 3days/Week Office)

Full time - Creospan

Role Overview

We are seeking a highly motivated Site Reliability Engineer (SRE) with strong expertise in Dynatrace, AWS Cloud, monitoring, observability, and production support. The ideal candidate will be responsible for ensuring application availability, system reliability, performance optimization, and operational excellence across enterprise-scale environments.

This role requires hands-on experience in application monitoring, incident management, troubleshooting, automation, and collaboration with Development, DevOps, and Infrastructure teams to maintain highly available and resilient systems.

Key Responsibilities

Monitoring & Observability

Design, develop, and maintain Dynatrace dashboards, alerts, monitoring profiles, and observability solutions.
Configure and manage application performance monitoring (APM), infrastructure monitoring, and distributed tracing.
Create and maintain operational dashboards, reports, and service health metrics.
Establish proactive alerting and monitoring strategies to identify issues before they impact users.

Production Support & Incident Management

Monitor application and infrastructure performance to identify bottlenecks, anomalies, and system issues.
Investigate and resolve production incidents, defects, and performance-related problems.
Participate in critical incident management and on-call support rotations.
Perform Root Cause Analysis (RCA) and implement corrective and preventive actions.
Ensure adherence to SLAs, SLOs, and operational excellence standards.

AWS Cloud & Infrastructure Reliability

Support and maintain cloud-native applications hosted on AWS.
Analyze system performance, scalability, and reliability within AWS environments.
Collaborate with infrastructure teams to optimize cloud resources and improve system resilience.
Support high-availability and disaster recovery strategies.

DevOps & Automation

Support CI/CD deployments, release management activities, and production rollouts.
Collaborate with DevOps teams to improve deployment automation and operational efficiency.
Automate monitoring, reporting, and operational tasks using scripting and automation tools.
Continuously improve system reliability through automation and process optimization.

Collaboration & Documentation

Work closely with Development, QA, DevOps, and Infrastructure teams to improve application stability.
Document troubleshooting procedures, monitoring configurations, runbooks, and standard operating procedures.
Participate in architecture and design discussions to improve system observability and reliability.

Required Skills & Qualifications

Core SRE & Production Support

Strong experience as a Site Reliability Engineer (SRE), Production Support Engineer, Application Support Engineer, or DevOps Engineer.
Hands-on experience supporting business-critical production environments.
Strong understanding of Incident Management, Problem Management, Change Management, and RCA processes.

Monitoring & Observability

Hands-on experience with Dynatrace.
Expertise in dashboard creation, alert configuration, performance monitoring, and observability practices.
Experience with monitoring and troubleshooting application, infrastructure, and cloud performance issues.
Exposure to tools such as Grafana, Splunk, CloudWatch, AppDynamics, Prometheus, or similar monitoring platforms is preferred.

AWS Cloud

Strong hands-on experience with AWS Cloud services.
Understanding of cloud architecture, monitoring, logging, and performance optimization.
Experience supporting cloud-native and distributed applications.

DevOps & CI/CD

Experience with CI/CD tools and DevOps practices.
Familiarity with deployment pipelines, release processes, and production change management.
Experience working in Agile and DevOps environments.

Application & System Knowledge

Good understanding of APIs, Microservices Architecture, and Distributed Systems.
Experience troubleshooting application performance and infrastructure-related issues.
Understanding of networking fundamentals, web servers, and application architectures.

Scripting & Automation

Working knowledge of Python, Bash, Java, or similar scripting/programming languages.
Experience automating operational and monitoring tasks is preferred.

Nice to Have

Experience with Kubernetes and Docker.
Exposure to Infrastructure as Code (Terraform, CloudFormation).
Knowledge of Site Reliability Engineering best practices.
Experience supporting high-volume enterprise applications.
ITIL Foundation certification or equivalent.

More Info

Job Type:

Permanent Job

Industry:

Other

Function:

Site Reliability Engineering

Employment Type:

Full time

About Company

creospan private limited

Job ID: 148875713

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Last Updated: 14-06-2026 06:03:51 PM

Homejobs in PuneSite Reliability Engineer

Similar Jobs

Lead Site Reliability Engineer

Mastercard

Pune, India

Skills:

pipeline management , Scripting, Java, C, Maven, Data Structures, Bitbucket, Artifactory, Automation, Devops, Jenkins, Algorithms, Software Design, Git, Perl, Ruby, Python, Chef, Go

Senior Site Reliability Engineer - AVP

Deutsche Bank

Pune, India

Skills:

Windows server, Saas, Openshift, Kdb, Grafana, Mssql, Itrs, New Relic, Geneos, Gcp, Terraform, Ansible, Netcool, Distributed Systems, Oracle, Kubernetes, Error budgets, Unix servers, Incident governance, OpenTelemetry, Telemetry pipelines, Observability tools, SLOs