Read everything carefully. The requirements and screening questions are critical and if not answered correctly and satisfactorily will result in auto-rejection and waste of your time.
- Work from Home.
- This is a full-time role. If you plan to do 2 or more jobs at the same time or want to do this part-time, that won't work for us. In that case please do not apply as it will get auto-rejected
- Note - this job requires working late night India time until 4AM to overlap with USA working times. Do not apply if this timing doesn't work
- Salary depends on experience and current verifiable (paychecks) compensation.
- Mid-level candidates with 3-5 years experience are suitable
AI Architect (Hands-On) — Multi-Agent AI Platform
About Qubrid AI
Qubrid AI is building next-generation AI infrastructure focused on inference, GPUs, multi-model orchestration, and scalable AI deployments. Our mission is simple: democratize access to AI infrastructure - from developers spending their first $5 to enterprise-scale AI deployments processing billions of inference requests. We are looking for a deeply technical AI Architect who can design and build production-grade AI systems end-to-end, not just create architecture diagrams.
This role is for builders:
- You should be equally comfortable:
- writing production Python code
- optimizing inference pipelines
- working with open-source models
- building multi-agent systems
- designing scalable backend architectures
- deploying AI systems into production
- If you are primarily theoretical or management-focused, this role is probably not the right fit.
What You'll Build
You will help architect and build a full-stack multi-agent AI SaaS platform including:
- Multi-agent orchestration systems
- AI inference pipelines
- Fine-tuning workflows
- RAG systems
- Tool-calling architectures
- Memory and context management systems
- Model routing and optimization layers
- Backend APIs and distributed systems
- GPU-aware inference infrastructure
- Enterprise-grade scalable deployments
- This is a highly hands-on engineering role where architecture and implementation go together.
Responsibilities
- AI Systems & Multi-Agent Architecture
- Design and build production-grade multi-agent AI systems
- Develop orchestration frameworks for autonomous workflows
- Implement agent communication, memory, planning, and tool usage
- Build scalable RAG and retrieval pipelines
- Design long-context and multi-modal workflows
- Inference & Model Infrastructure
- Optimize inference pipelines for latency and throughput
- Work with open-source models including Llama, Qwen, Kimi, Mistral, DeepSeek, Gemma, Flux, SDXL, and other frontier/open models
Implement model serving infrastructure using technologies like:
- vLLM
- TensorRT-LLM
- TGI
- Ollama
- SGLang
- Ray Serve
- Build intelligent model routing and fallback systems
- Improve GPU utilization and inference efficiency
- Fine-Tuning & Model Optimization
- Build and manage fine-tuning pipelines
- Work with:
- LoRA / QLoRA
- PEFT
- RLHF/RLAIF concepts
- Quantization
- Distillation
- Evaluate models across latency, quality, and cost tradeoffs
Backend & Platform Engineering
- Develop scalable backend systems using Python
- Design APIs, microservices, async workflows, and distributed systems
- Build production-grade SaaS architecture
- Implement observability, logging, monitoring, and reliability systems
- Work with vector databases, caching systems, queues, and storage layers
- Deployment & Infrastructure
- Deploy AI systems on cloud and GPU infrastructure
- Work with Kubernetes, Docker, and scalable orchestration systems
- Build highly available inference infrastructure
- Optimize infrastructure costs and scalability
Requirements
General requirements
- 3-5 Years in AI architecture and system design
- Strong hands-on Python expertise
- Proven experience building production AI systems
- Experience with LLM inference optimization
- Deep understanding of transformer architectures and modern LLM ecosystems
- Experience with open-source model deployment
- Strong backend engineering experience
- Experience designing scalable SaaS platforms
- Experience with APIs, async systems, and distributed architectures
- Strong debugging and systems-thinking ability
AI/ML Experience
- Multi-agent systems
- RAG architectures
- Fine-tuning pipelines
- Embeddings and vector databases
- Tool-calling frameworks
- Model evaluation and benchmarking
- Prompt orchestration and workflow systems
Infrastructure Experience
- Docker
- Kubernetes
- GPU infrastructure
- CI/CD pipelines
- Cloud platforms (AWS/GCP/Azure)
- Distributed inference systems
What We're Looking For
We are specifically looking for engineers who:
- build things themselves
- move fast
- can go from idea to production
- understand both AI and systems engineering
- can architect and implement
- are comfortable operating in ambiguity
- care about performance and scalability
- are obsessed with execution
- This is not a slide deck architect role.
You should be able to:
- write production code daily
- review system bottlenecks
- optimize inference performance
- debug distributed systems
- build MVPs rapidly
- scale products into production systems
- Bonus Points
- Experience building AI SaaS products from scratch
- Experience with agentic frameworks
- Experience with GPU optimization
- Contributions to open-source AI projects
- Experience with large-scale inference systems
- Startup experience
- Experience working with high-growth engineering teams
If you want to help shape the future of AI infrastructure and build systems that can scale from startup experimentation to enterprise deployments, we'd love to talk.