AI Architect Engineer (Hands-on coder) WFH

qubrid ai

India

3-5 Years

Save

Posted an hour ago
Be among the first 10 applicants

Early Applicant

Job Description

Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will result in auto-rejection and waste of your time.

Work from Home.
This is a full-time role. If you plan to do 2 or more jobs at the same time or want to do this part-time, that won't work for us. In that case please do not apply as it will get auto-rejected
Note - this job requires working late night India time until 4AM to overlap with USA working times. Do not apply if this timing doesn't work
Salary depends on experience and current verifiable (paychecks) compensation.
Mid-level candidates with 3-5 years experience are suitable

AI Architect (Hands-On) — Multi-Agent AI Platform

About Qubrid AI

Qubrid AI is building next-generation AI infrastructure focused on inference, GPUs, multi-model orchestration, and scalable AI deployments. Our mission is simple: democratize access to AI infrastructure - from developers spending their first $5 to enterprise-scale AI deployments processing billions of inference requests. We are looking for a deeply technical AI Architect who can design and build production-grade AI systems end-to-end, not just create architecture diagrams.

This role is for builders:

You should be equally comfortable:
writing production Python code
optimizing inference pipelines
working with open-source models
building multi-agent systems
designing scalable backend architectures
deploying AI systems into production
If you are primarily theoretical or management-focused, this role is probably not the right fit.

What You'll Build

You will help architect and build a full-stack multi-agent AI SaaS platform including:

Multi-agent orchestration systems
AI inference pipelines
Fine-tuning workflows
RAG systems
Tool-calling architectures
Memory and context management systems
Model routing and optimization layers
Backend APIs and distributed systems
GPU-aware inference infrastructure
Enterprise-grade scalable deployments
This is a highly hands-on engineering role where architecture and implementation go together.

Responsibilities

AI Systems & Multi-Agent Architecture
Design and build production-grade multi-agent AI systems
Develop orchestration frameworks for autonomous workflows
Implement agent communication, memory, planning, and tool usage
Build scalable RAG and retrieval pipelines
Design long-context and multi-modal workflows
Inference & Model Infrastructure
Optimize inference pipelines for latency and throughput
Work with open-source models including Llama, Qwen, Kimi, Mistral, DeepSeek, Gemma, Flux, SDXL, and other frontier/open models

Implement model serving infrastructure using technologies like:

vLLM
TensorRT-LLM
TGI
Ollama
SGLang
Ray Serve
Build intelligent model routing and fallback systems
Improve GPU utilization and inference efficiency
Fine-Tuning & Model Optimization
Build and manage fine-tuning pipelines
Work with:
LoRA / QLoRA
PEFT
RLHF/RLAIF concepts
Quantization
Distillation
Evaluate models across latency, quality, and cost tradeoffs

Backend & Platform Engineering

Develop scalable backend systems using Python
Design APIs, microservices, async workflows, and distributed systems
Build production-grade SaaS architecture
Implement observability, logging, monitoring, and reliability systems
Work with vector databases, caching systems, queues, and storage layers
Deployment & Infrastructure
Deploy AI systems on cloud and GPU infrastructure
Work with Kubernetes, Docker, and scalable orchestration systems
Build highly available inference infrastructure
Optimize infrastructure costs and scalability

Requirements

General requirements

3-5 Years in AI architecture and system design
Strong hands-on Python expertise
Proven experience building production AI systems
Experience with LLM inference optimization
Deep understanding of transformer architectures and modern LLM ecosystems
Experience with open-source model deployment
Strong backend engineering experience
Experience designing scalable SaaS platforms
Experience with APIs, async systems, and distributed architectures
Strong debugging and systems-thinking ability

AI/ML Experience

Multi-agent systems
RAG architectures
Fine-tuning pipelines
Embeddings and vector databases
Tool-calling frameworks
Model evaluation and benchmarking
Prompt orchestration and workflow systems

Infrastructure Experience

Docker
Kubernetes
GPU infrastructure
CI/CD pipelines
Cloud platforms (AWS/GCP/Azure)
Distributed inference systems

What We're Looking For

We are specifically looking for engineers who:

build things themselves
move fast
can go from idea to production
understand both AI and systems engineering
can architect and implement
are comfortable operating in ambiguity
care about performance and scalability
are obsessed with execution
This is not a slide deck architect role.

You should be able to:

write production code daily
review system bottlenecks
optimize inference performance
debug distributed systems
build MVPs rapidly
scale products into production systems
Bonus Points
Experience building AI SaaS products from scratch
Experience with agentic frameworks
Experience with GPU optimization
Contributions to open-source AI projects
Experience with large-scale inference systems
Startup experience
Experience working with high-growth engineering teams

If you want to help shape the future of AI infrastructure and build systems that can scale from startup experimentation to enterprise deployments, we'd love to talk.