
Search by job, company or skills
Platform Engineer (Reinforcement Learning Systems)
Overview
We are looking for a Platform Engineer to build the infrastructure, tooling, and systems that power large-scale Reinforcement Learning (RL) workflows. This role focuses on enabling researchers to train, evaluate, and deploy RL models efficiently by providing a scalable and reliable experimentation platform.
You will work at the intersection of distributed systems engineering and ML research, building platforms that abstract away infrastructure complexity and enable self-serve experimentation for research teams.
About Deccan AI
Deccan AI is a fast-growing, venture-backed AI infrastructure company focused on training, evaluating, and improving next-generation AI systems. Headquartered in the Bay Area, with a growing India hub in Hyderabad, the company was founded by alumni of IIT Bombay, IIM Ahmedabad, and former Google leaders.
We work with some of the world's leading AI frontier labs and research organizations, including Google DeepMind, Snowflake, and other cutting-edge AI teams. Backed by Prosus Ventures, Deccan AI recently raised $25M in Series A funding and is entering a significant growth phase.
With a global network of over 1 million experts, advanced automation systems, and vertically integrated platforms, we deliver the high-quality data and evaluation infrastructure that state-of-the-art AI models depend on. As the AI infrastructure market rapidly expands, Deccan AI is building the systems powering the future of AI.
What You'll Do
RL Training Infrastructure
Data & Simulation Pipelines
Performance & System Optimization
Observability & Experimentation
Simulation-to-Real Support
Required Skills & Experience
Technical Skills
ML / RL Knowledge
Infrastructure Expertise
Preferred Qualifications
What We're Looking For
Why This Role Matters
This role is critical to enabling next-generation RL research. You will be building the foundational platform that allows researchers to run experiments faster, scale training efficiently, and iterate seamlessly — directly accelerating advancements in reinforcement learning systems.
Job ID: 147426861
Skills:
Python development, Kubernetes-based deployment, agentic reasoning patterns, RAG based architectures, Fast API, LLM and agent frameworks, REST API design and implementation, Containerisation, prompt engineering
Skills:
Python or R, MLOps practices and tools
Skills:
Version Control, Api Development, Cloud Platforms, LLM Experience, Python for AI, AI ML Development, ML Frameworks, DevOps Deployment, Database Integration, Frontend Skills
Skills:
Java, Jenkins, Devops, Git, Gcp, MLops, Containers, Azure, Kubernetes, Python, AWS, Generative AI, event-driven services, agentic AI systems, ArgoCD
Skills:
Tensorflow, Pytorch, XGBoost, Python, feature stores, experiment tracking, production-grade ML code, model versioning, MLflow, reinforcement learning, end-to-end ML pipelines, SageMaker, Kubeflow, Vertex AI
We don’t charge any money for job offers