Job Details:
Job Title: Sr. Site Reliability Engineer (SRE)
Duration: Contract to Hire (On the Payroll of Datum Technology Group)
Location: Chennai || Mumbai || Gurugram
Interview Process: Virtual (2 Rounds) + 1 Technical screening.
Job Description:
- We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to enhance reliability, scalability, and performance across our cloud infrastructure, with a strong emphasis on cloud security, compliance, networking, and Linux operating systems expertise.
- This role combines reliability engineering with security best practices to ensure our cloud infrastructure is resilient, secure, and compliant.
Responsibilities:
- Develop and maintain Infrastructure as Code (IaC) using Terraform, including advanced module design and best practices for highly complex environments.
- Design and optimize CI/CD pipelines with a focus on automation, scalability, and deployment efficiency. Ability to discuss and implement pipeline optimizations from prior experience.
- Collaborate with development teams to integrate security and observability tools into CI/CD pipelines, automating security checks.
- Address vulnerabilities in code libraries and infrastructure (e.g., OS packages) through patching and remediation.
- Partner with application teams to resolve specific security findings and improve overall system resilience.
- Troubleshoot and debug networking issues, including deep understanding of networking layers, components, and configurations across cloud and hybrid environments.
- Administer and optimize Linux-based operating systems, including troubleshooting, performance tuning, and implementing best practices for security and reliability.
Requirements:
- 6+ years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Engineering.
- Deep knowledge of networking fundamentals, Linux operating systems, and CI/CD optimization strategies.
- Very strong expertise in writing complex Terraform code, including advanced module design and best practices for large-scale, highly complex environments.
- Proficiency in scripting or programming languages (e.g., Python, Bash, Go).
- Hands-on experience with Azure cloud platform
- Should be very strong in Basic networking concepts (OSI & TCP/IP Models, IP Addressing & Subnetting, DNS, HTTP & HTTPS, etc)
- Linux OS and troubleshooting
- Writing complex terraform code
- Azure cloud and CI/CD concepts.
Bonus/Preferred Skills:
- Experience with Docker and Kubernetes for containerization and orchestration.