DevOps Engineering Lead
- Location: On-site Gurgaon (Hybrid)
- Department: Technology / Engineering
- Experience Level: 8+ Years
- Employment Type: Full-Time
ABOUT THE ROLE
We are looking for a highly skilled LeadDevOps Engineer to join our team and help build, scale, and maintain a reliable messaging platform that powers seamless communication for millions of users.
You'll be responsible for designing cloud-native infrastructure, automating deployments, ensuring high availability, and driving operational excellence in a fast-paced environment.
KEY RESPONSIBILITIES
Infrastructure & Deployment
- Design, implement, and manage scalable, resilient cloud infrastructure (AWS/GCP/Azure) for messaging workloads.
- Build CI/CD pipelines to enable automated, reliable, and fast delivery of new features.
- Containerize applications (Docker/Kubernetes) and optimize orchestration for performance.
Reliability & Monitoring
- Ensure high availability and low latency of the messaging platform with proactive monitoring and alerting (Prometheus, Grafana, ELK, Datadog, etc.).
- Troubleshoot production issues, perform root cause analysis, and implement long-term fixes.
- Define and track SLOs/SLAs/SLIs for messaging services.
Automation & Security
- Automate provisioning, scaling, and failover processes using Infrastructure as Code (Terraform, Ansible, Helm).
- Enforce best practices for system security, secrets management, and compliance.
- Implement disaster recovery, backup strategies, and incident response playbooks.
Collaboration & Culture
- Work closely with developers, SREs, and QA teams to deliver reliable features.
- Advocate for DevOps culture: CI/CD adoption, monitoring-first mindset, and blameless postmortems.
- Contribute to documentation and knowledge sharing across teams.
Required Skills & Qualifications
- 8+ years of experience in DevOps/SRE/Cloud Engineering roles.
- Strong experience with Kubernetes, Docker, and CI/CD pipelines.
- Hands-on expertise in cloud platforms (AWS/GCP/Azure) and Infrastructure as Code (Terraform/CloudFormation).
- Solid background in Linux systems, networking, and messaging protocols (e.g., Kafka, RabbitMQ, MQTT, WebSockets, or similar).
- Experience with monitoring, logging, and observability stacks.
- Knowledge of scripting/programming (Python, Bash, Go, etc.).
Preferred Skills
- Experience with real-time, high-throughput systems (messaging, streaming, or event-driven architectures).
- Exposure to scaling microservices in production.
- Familiarity with security best practices in distributed systems.
Why Join Us
- Opportunity to build and scale a mission-critical messaging platform used globally.
- Work with a passionate, talented team driving innovation in cloud-native communication.
- Competitive salary.