We are looking for a motivated Site Reliability Engineer (SRE) who will play a crucial role in driving operational excellence in our software development teams by ensuring the availability, performance and scalability of our production systems. You will work closely with one or potentially multiple software development teams to raise the bar in terms of their observability practice, enhance incident response capabilities and help reduce operational toil through automation.
Key Responsibilities
- Implement and manage the observability stack (metrics, logs, traces and alerts) to ensure optimal performance and availability
- Analyze observability data to proactively identify performance bottlenecks and drive reliability improvements
- Define, track and report on Service Level Objectives (SLOs) and Service Level
Indicators (SLIs) for key services.
- Identify, develop and implement automation tools to reduce operational toil and improve system reliability
- Conduct blameless incident postmortems and drive preventive measures
- Collaborate with developers to improve service reliability through better design, testing, and deployment practices
- Assist developers in troubleshooting complex issues by delving into the available observability data
- Advocate for SRE best practices within the embedded team and contribute to
wider company SRE initiatives
Your profile
- Hands-on experience with managing and using monitoring tools (e.g. ELK,
Grafana, Prometheus)
- 3+ years of experience in a Site Reliability Engineering, DevOps, or Systems
Engineering role
- Experience with CI/CD tooling (e.g. Jenkins, GitLab CI, Argo CD)
- Experience with cloud platforms (preferably GCP)
- Comfortable with at least one scripting language
- Experience working in large, complex production environments
- Excellent problem-solving, communication, and collaboration skills
- GCP: BigQuery, Airflow, Cloudstorage.
- Observability: ELK + grafana
- Devops: CI/CD Gitlab and Jenkins
- Integration background
- Senior profile
Bonus points
- Familiarity with Infrastructure as Code (IaC) principles and tools (e.g. Terraform, Ansible)
- Experience with containerization and orchestration technologies (e.g. Docker,
Kubernetes)
- Experience working with distributed systems
- Familiarity with Java software development
What's In It For You
A family atmosphere , people-centric culture, where your emotional and physical well-being matters.
A company of great colleagues with a global mindset, where you feel welcomed from day one.
A competitive salary , medical insurance for family , retirement benefits Healthy work life balance
Internal career opportunities, professional development, including access to LinkedIn Learning and many in-house/external training courses
Job security working for a global company with strong presence & commitment in India.
PEOPLE ARE AT OUR HEART
TVH is a global business with a family atmosphere, where people are at the center. We value clarity, mutual respect, kindness and open communication. Our people are down-to-earth, easy to work and engage with. We welcome differences and celebrate new ideas.
About Tvh
TVH is a parts specialist for quality parts and accessories for material handling, industrial vehicles, and construction and agricultural equipment. Working at TVH is opting for a company that excels as an international market leader and is well-known for its unstoppable craving for innovation.