We are hiring a
Senior Reliability Engineer to join our newly formed
Reliability Engineering Team (RET) a team that operates like a product engineering squad, focused on building reliability as a platform capability across the organization.
This is
not a support or operations role. It is a
core software engineering position where you will design, build, and ship shared reliability solutions that empower multiple product teams with safe deployments, deep observability, and resilient runtime systems.
Responsibilities
Design & build reliability platforms services, libraries, CLIs, and automation used across teams
Develop deployment controllers, config validators, tracing libraries, queue monitors & more
Own the end-to-end lifecycle: design implementation testing rollout evolution
Define APIs, SDKs, templates, Helm charts, Terraform modules & pipelines for easy adoption
Drive architecture decisions around rollout strategies, failure modes & resilience patterns
Use production insights & incident data to shape the reliability roadmap
Embed reliability into the SDLC (design reviews, golden paths, reference implementations)
Contribute through code reviews, documentation, mentoring & design sessions
Requirements
5 - 7 years of strong backend/platform engineering experience
Proficiency in Java, Kotlin, C#, Go, or Python
Experience building production-grade systems, libraries, or shared tooling
Strong understanding of distributed systems & microservices architecture
Experience working in cloud-native environments (Kubernetes is a plus)
Hands-on implementation of observability (metrics, tracing, logging)
Experience building resilience patterns (retries, circuit breakers, timeouts, graceful degradation)
Strong engineering practices: automated testing, clean code, CI/CD, trunk-based development
Experience treating Infrastructure-as-Code (Terraform, Helm, GitOps) as engineering artefacts
Ability to translate reliability challenges into scalable engineering solutions & APIs
Nice to have
Experience designing internal developer platforms
Exposure to deployment strategies (blue/green, canary releases)
Experience with performance engineering & load testing
Experience mentoring engineers or leading design initiatives
We offer
- Opportunity to work on bleeding-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Benefits package - medical insurance, sports
- Corporate social events
- Professional development opportunities
- Well-equipped office
About Us
Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.