Senior DevOps Engineer

Grid Dynamics

Chennai, India

6-15 Years

Save

Posted 17 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

We are seeking an experienced SRE Observability Engineer to join a global Monitoring and Observability team responsible for building, scaling, and maintaining enterprise-grade observability solutions. This role focuses on modernizing monitoring platforms, driving end-to-end observability strategy, and enabling data-driven decision-making across large-scale distributed systems.

The ideal candidate brings deep expertise in cloud-native technologies, observability tools, and automation, along with strong collaboration and communication skills to influence technical and business stakeholders.

Responsibilities

Operate within a globally distributed environment, supporting large-scale systems.
Collaborate with cross-functional teams to design and implement observability solutions for enterprise-wide adoption.
Manage and enhance legacy monitoring platforms while contributing to modernization initiatives.
Drive the implementation of end-to-end observability solutions across metrics, logs, and traces.
Analyze complex system behaviors and provide insights to solve performance and reliability challenges.
Influence strategic decisions by providing technical guidance and recommendations.
Communicate effectively with stakeholders and promote best practices in observability and SRE.
Contribute to documentation, knowledge sharing, and continuous improvement initiatives.
Perform additional duties as required to support operational excellence.

Requirements

Experience: 6–15 years in Site Reliability Engineering, DevOps, or Observability Engineering.
Strong experience with OpenShift/Kubernetes administration, including deployment, troubleshooting, resource management, and networking.
Hands-on expertise with Grafana and observability ecosystems, including:

Grafana administration (dashboards, alerts, data sources, user management)
Experience with Prometheus and PromQL
Working knowledge of backend components such as Mimir (metrics), Loki (logs), and Tempo (traces)
Experience with enterprise monitoring tools such as Geneos ITRS or similar

Experience with Helm charts for application deployment and management (including dependencies and customization).
Strong scripting and automation skills using Bash or Python.
Ability to create clear, concise, and well-structured technical documentation.
Excellent analytical, problem-solving, and communication skills.

Nice to have

Experience with application deployment platforms such as Lightspeed Enterprise (or similar).
Exposure to Google Cloud Platform (GCP) operations and services.
Familiarity with modern cloud-native observability frameworks and practices.
Experience in large-scale enterprise environments with distributed systems.

We offer

Opportunity to work on bleeding-edge projects
Work with a highly motivated and dedicated team
Competitive salary
Flexible schedule
Benefits package - medical insurance, sports
Corporate social events
Professional development opportunities
Well-equipped office

About Us

Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.