Search by job, company or skills

Grid Dynamics

Senior Site Reliability Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

We are hiring a Senior Reliability Engineer to join our newly formed Reliability Engineering Team (RET) a team that operates like a product engineering squad, focused on building reliability as a platform capability across the organization.

This is not a support or operations role. It is a core software engineering position where you will design, build, and ship shared reliability solutions that empower multiple product teams with safe deployments, deep observability, and resilient runtime systems.

Responsibilities

Design & build reliability platforms services, libraries, CLIs, and automation used across teams

Develop deployment controllers, config validators, tracing libraries, queue monitors & more

Own the end-to-end lifecycle: design implementation testing rollout evolution

Define APIs, SDKs, templates, Helm charts, Terraform modules & pipelines for easy adoption

Drive architecture decisions around rollout strategies, failure modes & resilience patterns

Use production insights & incident data to shape the reliability roadmap

Embed reliability into the SDLC (design reviews, golden paths, reference implementations)

Contribute through code reviews, documentation, mentoring & design sessions

Requirements

5 - 7 years of strong backend/platform engineering experience

Proficiency in Java, Kotlin, C#, Go, or Python

Experience building production-grade systems, libraries, or shared tooling

Strong understanding of distributed systems & microservices architecture

Experience working in cloud-native environments (Kubernetes is a plus)

Hands-on implementation of observability (metrics, tracing, logging)

Experience building resilience patterns (retries, circuit breakers, timeouts, graceful degradation)

Strong engineering practices: automated testing, clean code, CI/CD, trunk-based development

Experience treating Infrastructure-as-Code (Terraform, Helm, GitOps) as engineering artefacts

Ability to translate reliability challenges into scalable engineering solutions & APIs

Nice to have

Experience designing internal developer platforms

Exposure to deployment strategies (blue/green, canary releases)

Experience with performance engineering & load testing

Experience mentoring engineers or leading design initiatives

We offer

  • Opportunity to work on bleeding-edge projects
  • Work with a highly motivated and dedicated team
  • Competitive salary
  • Flexible schedule
  • Benefits package - medical insurance, sports
  • Corporate social events
  • Professional development opportunities
  • Well-equipped office

About Us

Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 144711567

Similar Jobs