Senior DevOps Engineer

rdash (yc w22)

Gurugram, Gurugram, India

4-6 Years

Save

Posted 11 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Senior DevOps Engineer

Gurgaon | 4+ Years Experience | Full-time

About RDash

RDash is a Y Combinator-backed SaaS platform that simplifies construction project management

with AI-powered tools. Thousands of professionals across India and the UAE use RDash to manage

projects end-to-end — from procurement and budgets to daily progress reports. We move fast, ship

often, and care deeply about building software that works on real job sites.

The Role

You'll own the infrastructure, security, and reliability of RDash's production platform. This is a

hands-on role — you'll architect cloud infrastructure, harden systems, build observability, and debug

production issues across Kubernetes, networking, and application layers.

A core part of the mission is contributing to RDash's Agentic SDLC platform (in development) —

where AI agents will autonomously generate, test, and deploy applications. You'll build the

infrastructure foundations that make autonomous agent execution possible: sandboxed

environments, dynamic pod lifecycle management, guardrails for non-deterministic workloads, and

observability that feeds back into agent decision-making.

What You'll Do

Infrastructure & Cloud

• Architect and operate production AKS clusters and Azure infrastructure — VNets,

NSGs, Private Endpoints, Firewall, Application Gateway

• Design cloud networking end-to-end — VNet peering, subnet segmentation, route tables,

DNS resolution, and firewall rules

• Ship Terraform modules that are modular, version-pinned, and reusable — remote state,

workspace isolation, zero drift across environments

• Own cost optimization — budgets, alerts, right-sizing node pools, spot instances. Wasted

cloud spend is a bug.

Kubernetes & Containers

• Debug Kubernetes in production — pod scheduling failures, CNI issues, OOMKills,

resource contention, and CrashLoopBackOff across dynamic workloads

• Enforce pod security, network policies, resource quotas, and LimitRanges for

multi-tenant workload isolation

• Own container supply chain security — image scanning (Trivy), base image hardening,

ACR access controls

• Manage Helm chart lifecycle — versioning, environment-specific values, dependency

management, rollback strategies

CI/CD & Deployment

• Own CI/CD end-to-end in GitHub Actions — build pipelines, deployment strategies

(blue-green, canary), automated rollbacks

• Implement GitOps practices — declarative configs, drift detection, Git as the single source

of truth

• Build developer experience tooling — self-service deployments, preview environments,

fast feedback loops

Observability & Security

• Build centralized observability — Prometheus, Grafana, Loki, OpenTelemetry, Microsoft

Sentinel for SIEM

• Own incident response — severity definitions, escalation paths, on-call rotations, blameless

post-mortems

• Harden everything — WireGuard VPN, RBAC, PIM/PAM, WAF rules, VAPT remediation,

audit logging

• Manage compliance — SOC2 evidence collection, access reviews, DLP controls, secrets

management via Azure Key Vault

• Own certificate lifecycle — cert-manager, Let's Encrypt, internal CA for mTLS where

needed

Databases & Automation

• Manage PostgreSQL and MongoDB in production — query tuning, connection pooling,

HA, backup/restore with tested recovery drills

• Manage Redis and NATS for caching and event streaming — cluster health, persistence,

failover testing

• Automate relentlessly with Python and Bash — if you're doing it twice, script it. Design for

failure — DR plans, blast radius containment, capacity planning for workloads that can 10x in

minutes.

What You Bring

Must-Have

• 4+ years of hands-on DevOps / Platform / SRE experience

• Deep Kubernetes expertise — production debugging, networking, multi-tenant isolation

• Strong Terraform skills — modular code, remote state, workspace management

• Production experience with Azure (AKS, VNets, Key Vault, ACR, Firewall) and/or AWS

• CI/CD ownership — GitHub Actions, deployment strategies, rollback mechanisms

• Security hardening — VPN, RBAC, PIM/PAM, WAF/IPS, VAPT, SOC2 compliance

• PostgreSQL in production — query tuning, HA, backup/restore

• Strong scripting skills in Python and Bash

• Comfort with on-call, incident ownership, and post-mortems

Nice-to-Have

• Experience with non-deterministic or AI/ML workloads — GPU scheduling, dynamic pod

scaling

• Exposure to agentic systems, LLM pipelines, or autonomous execution platforms

• Experience with NATS, Redis, or event-driven architectures

• GitOps tooling (ArgoCD, Flux) and container supply chain security (Trivy, SBOM)

• FinOps experience — cloud cost optimization and resource forecasting

Who You Are

• You lead with action — you see something broken, you fix it, you document it, you move on

• You're opinionated about infrastructure quality but pragmatic about deadlines

• You debug under pressure and bring structure to chaos when prod goes down

• You think about the platform as a product, not a support function

• You're excited about building infrastructure for AI-native systems, not just maintaining legacy

setups

What We Offer

• Competitive compensation

• Platform-wide ownership in a fast-growing, YC-backed startup

• Direct influence on infrastructure, security, and platform architecture

• Contribute to building an agentic AI platform from the ground up

• Small team, zero bureaucracy — you ship on Day 1

• A collaborative, no-BS culture that values shipping over slideware

RDash is an equal-opportunity employer. We welcome applicants of all backgrounds, identities, and experiences. If you're excited about building the platform behind an AI-native product that's changing how construction gets managed, we'd love to hear from you.