Company Description
a21.ai specializes in delivering cutting-edge AI and Generative AI solutions tailored to diverse industries. Our core expertise includes AI strategy, engineering, custom development, and successful implementation of innovative projects. We offer comprehensive services in Generative AI, LLMs, prompt engineering, and secure data handling, alongside accelerators for seamless integration and analytics. At a21.ai, we thrive on elevating intelligence and enabling transformative growth.
Role Overview
We are looking for a Cloud Engineer with strong Azure expertise to design, deploy, and scale cloud-native infrastructure for generative AI workloads.
This is not a traditional DevOps role.You will be working on:
- Multi-LLM systems
- High-throughput inference pipelines
- Vector databases
- Enterprise-grade AI deployments
If your experience is limited to basic CI/CD pipelines, this role will stretch you.
Key Responsibilities
Cloud Infrastructure (Core)
- Design and manage scalable infrastructure on Microsoft Azure
- Work with services like:
- Azure Kubernetes Service (AKS)
- Azure Virtual Machines / VMSS
- Azure Blob Storage / Data Lake
- Azure Networking (VNet, NSG, Private Endpoints)
- Optimize cloud architecture for performance, cost, and reliability
AI/ML Infrastructure
Deploy and manage:
- LLM APIs (Azure OpenAI, external APIs)
- Vector databases (Pinecone, Weaviate, or similar)
- Build infrastructure for:
- RAG pipelines
- Batch + real-time inference
- Multi-model orchestration systems
DevOps & Automation
- Implement CI/CD pipelines (GitHub Actions / Azure DevOps)
- Infrastructure as Code using:
- Terraform (preferred)
- ARM templates / Bicep (good to have)
- Automate deployments, scaling, and monitoring
Containers & Orchestration
- Containerization using Docker
- Orchestration using Kubernetes (AKS preferred)
- Manage:
- Autoscaling
- Load balancing
- Service mesh (optional but valuable)
Monitoring & Reliability
- Implement observability using:
- Prometheus, Grafana, ELK
- Azure Monitor / Application Insights
- Ensure:
- High availability
- Fault tolerance
- SLA adherence
Security & Compliance
Implement best practices:
- Identity & access management (IAM)
- Secrets management (Key Vault)
- Network isolation
- Work on enterprise-grade deployments with data privacy constraints
Required Skills
- 3-7 years in Cloud / DevOps / SRE roles
- AWS Certified (No Foundation certification - Atleast SA or Developer. DevOps Pro preferred)
- Strong hands-on experience with AWS as well as Microsoft Azure
Experience with:
- Kubernetes (AKS preferred)
- Docker
- Terraform
- Good understanding of:
- Networking (VPC/VNet, DNS, Load Balancers)
- Linux systems
- Scripting ability (Python / Bash)
- Familiarity with CI/CD pipelines
Good to have skills
- Experience with Azure OpenAI / LLM deployments
- Exposure to:
- Vector databases
- RAG pipelines
- Experience working with:
- Multi-cloud environments (AWS/GCP)
- Understanding of:
- Distributed systems
- High-scale data pipelines
What Makes This Role Different
Most DevOps roles maintain systems. This role builds the infrastructure layer for next-gen AI systems.
You will work on:
- Multi-LLM orchestration architectures
- AI-native infrastructure patterns
- Production-grade enterprise deployments
Who Should Apply
- Engineers who want to move beyond basic DevOps
- People interested in AI infrastructure, not just ML models
- Builders who care about scalability and system design
Compensation & Growth
- Competitive salary (based on experience)
- Fast growth into:
- AI Infrastructure Architect
- Platform Engineering Lead
- Direct exposure to enterprise AI deployments
How to Apply
Email [Confidential Information] with your resume + a short note on:
- Azure projects youve built
- Any infra you've designed end-to-end
Blunt Reality Check
- This is not a low-effort role
- You will deal with ambiguity and complex systems
- If you want predictable, repetitive DevOps work this is not for you