Search by job, company or skills

Deccan Air

Staff Software Engineer

Save
  • Posted a day ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Staff Engineer - Distributed AI Systems & Platform Infrastructure

Deccan AI | Hyderabad, India | Full-Time

About Deccan AI

Deccan AI is a venture-backed AI infrastructure company ($25M Series A, Prosus Ventures), headquartered in the Bay Area with an engineering hub in Hyderabad. We build the infrastructure layer for Reinforcement Learning and Agentic AI and work with frontier labs including Google DeepMind and Snowflake.

The Role

Not many engineers can go deep on distributed systems architecture and still own the cloud infrastructure, security posture and logging stack that holds it all together. Most choose depth or breadth. We need both - and your profile suggests you might be one of the rare few who has built that way.

As Staff Engineer you will own the full technical surface of Deccan AI's platform - designing distributed AI systems that scale, running them on secure and reliable cloud infrastructure, and building the observability layer that tells you the truth about what is actually happening in production.

What You Will Own

Distributed and Scalable AI Systems

  • Architect and operate distributed systems and scalable AI systems powering RL training pipelines and agentic workflows in production
  • Write high-performance backend services in Python or Java, reach for Go when latency and throughput demand it
  • Design for scale from day one - P99 latency, high throughput, fault tolerance and SLA ownership
  • Build for failure - chaos testing, graceful degradation and circuit breakers are not afterthoughts here

Cloud Infrastructure

  • Own cloud infrastructure on AWS or GCP across compute, networking, storage and Kubernetes at scale
  • Build infrastructure as code, automated provisioning and CI/CD pipelines that the engineering team trusts and depends on
  • Make cloud cost and performance decisions backed by data, not assumptions
  • Drive platform standardization that makes the whole team move faster

Security - Designed In, Not Added On

  • Architect zero-trust security across AI workloads covering model access, data pipelines and API boundaries
  • Own IAM, secrets management and network policies as first-class engineering responsibilities
  • Make security a constraint that shapes system design from the first conversation, not a checklist at the end
  • Build threat models that are living engineering tools, not documents filed and forgotten

Logging and Observability

  • Define and own logging standards and distributed tracing pipelines across the entire platform
  • Build a monitoring and alerting stack that surfaces what matters and cuts the noise
  • Design distributed tracing that finds latency bottlenecks and failure patterns before they become incidents
  • Drive post-mortems that produce architectural change, not just action items

You Are Likely a Fit If You Have

  • 10+ years owning distributed systems, scalable systems or AI systems end to end in production
  • Production depth in Python or Java, with Go for performance-critical, latency-sensitive services
  • Designed and operated cloud infrastructure on AWS or GCP under real security and reliability constraints
  • Gone deep on security fundamentals - zero-trust, IAM, secrets management, network policies
  • Built and owned logging and observability as a product that engineers rely on, not an ops checkbox
  • Debugged production incidents at scale and come out with better architecture every time

Bonus if you have:

  • Experience with RL infrastructure, pipelines or agentic system backends
  • Worked on ML infrastructure - training pipelines, model serving, feature stores
  • Open-source contributions in distributed systems, infrastructure or observability tooling
  • SOC2 or equivalent compliance implementation in a high-velocity engineering environment

Our Stack

Python, Java, Go, Kubernetes, Kafka, Ray, AWS, GCP, Terraform, Prometheus, Grafana

Why This Role is Worth Your Time

Most Staff Engineer roles ask you to go deep or go broad. This one asks for both - and gives you the technical autonomy, early-stage equity and direct founder access to do it properly. You will work on AI infrastructure problems that are genuinely unsolved, alongside a team that has built at scale before and chose to come back and do it again here.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 149017639

Similar Jobs

Hyderabad, India

Skills:

Aws ServicesDockerReactjsJavascriptKubernetesPythonmicroservice architectureAngularCaching technologiescloud native applicationsUI API integration

Hyderabad, India

Skills:

JavaPerforceWatirSelenium WebdriverSoapuiAutomation testingSdlcJenkinsGitJmeterBitbucketRubyPythonPytestActive DirectoryDesktop automation frameworksJAMF Connect

Hyderabad, India

Skills:

NodeAI MLTypescriptGcpKubernetesPythonAWSInfrastructure as CodeGenerative AIData PipelinesLlmWorkflow OrchestrationReliabilityObservabilityMonitoring

Hyderabad, India

Skills:

snowflake JavaPytestCassandraDynamodbKafkaSpring BootMicroservicesNosqlDjangoTddJUnitDockerFlaskMongoDBFastAPIRestful ApisKubernetesPythonunittest

Hyderabad, India

Skills:

static code analysis ScalaKafkaCloud Development KitRabbit MqDevSecOpsCloudKinesisTerraformAWS CloudFormationActive MQAWSData PipelinesEvent BridgeSQL DatabasesGen AI