
Search by job, company or skills
Role: Site Reliability Engineer (SRE) Core IT Infrastructure
Location: Chennai
Work mode: On-site (full Time)
Experience: 6+ year's
Key Responsibilities
Infrastructure Reliability & Operations
Design, implement, and maintain highly available and fault-tolerant infrastructure
Ensure reliability, performance, scalability, and security of core IT systems
Monitor system health, capacity, and performance using proactive observability practices
Lead incident response, root cause analysis (RCA), and post-incident reviews
Automation & SRE Development
Develop and maintain automation tools, scripts, and frameworks to reduce manual operations
Apply Infrastructure as Code (IaC) principles using tools such as Terraform, Ansible, or CloudFormation
Build self-healing systems and automate repetitive operational tasks
Improve deployment pipelines and operational workflows through engineering solutions
DevOps & Platform Engineering
Collaborate with DevOps, development, and security teams to support CI/CD pipelines
Enable seamless application deployments with minimal downtime
Support containerized and orchestration platforms (Docker, Kubernetes, OpenShift)
Implement best practices for configuration management and environment consistency
Monitoring, Observability & Performance
Design and maintain monitoring, logging, and alerting systems
Define and track SLIs, SLOs, and SLAs
Optimize system performance, capacity planning, and cost efficiency
Enhance observability using tools such as Prometheus, Grafana, ELK, Datadog, or similar
Security & Compliance
Implement infrastructure security best practices
Collaborate with security teams on vulnerability management and compliance requirements
Ensure secure access, identity management, and audit readiness
Required Skills & Qualifications
Technical Skills
Strong experience in Linux/Unix system administration
Proficiency in programming/scripting (Python, Go, Bash, Shell, or similar)
Experience with cloud platforms (AWS, Azure, or GCP)
Hands-on experience with containerization and orchestration
Knowledge of networking concepts (DNS, TCP/IP, load balancing, firewalls)
Experience with monitoring, logging, and alerting tools
Job ID: 141924901