Search by job, company or skills

Karix Mobile

Site Reliability Engineer

Save
new job description bg glownew job description bg glow
  • Posted 2 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

JD – Lead Site Reliability Engineer (SRE)

Location – Chennai

About the Role: We are looking for a Lead Site Reliability Engineer (SRE) with strong experience in managing production systems, distributed architectures, and cloud-native environments. This role focuses on ensuring system reliability, scalability, and performance while driving SRE best practices across teams. You will work closely with engineering and product teams to improve system resilience, automate operations, and lead incident management, while mentoring junior engineers and owning reliability initiatives end-to-end.

What you'll be Responsible for

• Lead troubleshooting and resolution of complex production issues in distributed systems.

• Drive reliability engineering practices, ensuring high availability and performance of systems.

• Manage and optimize messaging systems like Apache Kafka, RabbitMQ, and Redis.

• Architect, manage, and optimize Kubernetes clusters for scalability and resilience.

• Manage CI/CD pipelines and drive deployment automation.

• Implement and maintain monitoring, alerting, and observability using Prometheus, Grafana, and ELK stack.

• Lead incident management, root cause analysis (RCA), and post-mortem reviews.

• Mentor junior engineers and collaborate with cross-functional teams to improve system design and reliability.

What you'd have

• 5+ years of experience in SRE / DevOps / Production Engineering roles.

• Strong expertise in troubleshooting distributed systems and microservices architecture.

• Hands-on experience with Kafka, RabbitMQ, and Redis.

• Strong knowledge of Kubernetes and container orchestration.

• Experience with CI/CD pipelines and deployment automation.

• Solid understanding of Linux, networking, and cloud platforms (AWS / Azure / GCP).

• Experience with Infrastructure as Code (Terraform, Ansible).

• Strong scripting skills (Python, Bash, or similar).

• Database experience: MySQL / Oracle / MongoDB.

• Strong problem-solving, ownership mindset, and ability to lead initiatives.

Why join us

• Impactful Work: Play a key role in ensuring reliability and scalability of platforms that handle large-scale, real-time communication systems.

Tremendous Growth Opportunities: Accelerate your career by leading critical reliability initiatives and working on high-scale distributed systems.

Innovative Environment: Work in a fast-paced ecosystem that embraces automation, cloud-native technologies, and continuous improvement.

Karix is an equal opportunity employer. We champion diversity and are committed to creating an inclusive environment for all employees.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147501421

Similar Jobs

Chennai, India

Skills:

Python AutomationTerraformSplunkFastAPIMongoDBKubernetesAI-driven automationObservability toolsCloud environmentMonitoring

Chennai, India

Skills:

NginxTomcatDatadogElasticsearchJavascriptDockerTerraformRubyAWSNodejsUNIXRedisNew RelicRabbitmqJenkinsGcpHaproxyAnsibleNagiosMongoDBAzureKubernetesPackergraphite

Chennai, India

Skills:

OpenshiftPrometheusBashGrafanaHelmKubernetesPythonGeneos ITRS

Chennai, India

Skills:

RDSElkCloudformationPrometheusBashPulumiGrafanaRedisCloudwatchGcpTerraformTypescriptMongoDBAzureKubernetesPythonAWSGo

Chennai

Skills:

DockerKubernetesPrometheusGrafanaDatadogElkSplunk