
Search by job, company or skills
About MAKO:
Founded in 2013,Mako IT Labis a global software development company with a strong presence across the USA, UK, India, and Nepal. Over the years, we've partnered with companies globally helping them solve complex challenges and build meaningful digital experiences.
What truly defines Mako is ourculture. We believe in creating an environment where people feel empowered to take ownership, exercise freedom in their ideas, and contribute to solutions that genuinely make an impact. Learning is at the heart of who we areour teams constantly grow through hands-on exposure, real-world problem solving, and continuous knowledge sharing across functions and geographies.
We don't just build long-term partnerships with clientswe build long-term careers for our people. At Mako, you'll be part of a collaborative, supportive, and fast-growing global team where curiosity is encouraged, initiative is celebrated, and every individual plays a meaningful role in shaping the company's journey.
Role Overview:
We are seeking an experiencedAI Engineerwith deep expertise inLLM-driven architectures,RAG systems,agentic workflows, andmultimodal AI development. The ideal candidate will be skilled in building scalable AI pipelines usingFastAPI,Kafka,FastMCP, andTavily Web Search, while also having hands-on experience withvllm-based inferenceandStable Diffusion pipelines.You will architect and implement intelligent systems leveragingLarge Language Models,vision models, andautonomous agents, with a strong focus on observability, performance, and production reliability.
Key Responsibilities:
1. LLM, VLLM & Agentic System Development
Build autonomous LLM agents using LangChain, LangGraph, and FastMCP.
Develop RAG workflows using embeddings, vector stores, and knowledge-grounded reasoning.
IntegrateVLLM / SGLang / other high-throughput inference backendsfor low-latency model serving.
Implement Tavily web-search integrations for real-time knowledge augmentation.
Optimize inference using quantized GGUF, tensorized formats, and GPU-accelerated pipelines.
2. Multimodal & Image Generation Systems
Build and deployStable Diffusion(SDXL/SD 1.5/ControlNet/T2I) pipelines for image generation tasks.
Integrate LoRAs, control modules, and diffusion-based fine-tuning for custom domains.
Develop multimodal agents that combine LLM reasoning with vision tasks such as classification, captioning, or image prompts.
3. Backend & Infrastructure Engineering
Build robust FastAPI services for orchestrating LLMs, Stable Diffusion, retrieval, and agentic tasks.
Develop event-driven workflows usingKafkafor distributed AI systems.
Implement auditing, agent-output monitoring, and API-layer logging for end-to-end traceability.
4. High-level API & Third-party Integrations
Integrate third-party services: authentication, analytics, search APIs, cloud inference APIs, and enterprise data sources.
Build secure and scalable API layers for production deployments.
5. Fine-tuning & Model Lifecycle Management
Fine-tune LLaMA, Mistral, Phi-3, and diffusion models for domain-specific tasks.
Use MLflow for tracking experiments, hyperparameters, metrics, and versioning.
Conduct evaluation on hallucinations, retrieval consistency, reasoning depth, and multimodal accuracy.
Required Skills & Qualifications:
Core AI/LLM Skills
Experience withLLMs, RAG systems, LangChain, LangGraph, LlamaIndex
Hands-on withVLLM, SGLang, or similar inference engines
Model quantization (GGUF), optimization, and GPU memory tuning
Agent frameworks & tool calling (FastMCP, Groq, Hugging Face)
Multimodal & Image Generation
Stable Diffusion, ControlNet, LoRA fine-tuning, custom pipelines
Diffusers, ComfyUI, or InvokeAI experience (bonus)
Engineering & Systems
Kafka-based event-driven systems
FastAPI/Flask/Node.js backend development
Third-party API integrations
Docker, CI/CD, and cloud platforms (GCP/Azure)
Databases & Retrieval
MongoDB, DuckDB,
Embedding stores, vector databases (Pinecone / Qdrant), retrieval optimization
Observability & MLOps
MLflow for experiment tracking and model lifecycle
Performance monitoring, logging, auditing, API observability
Frontend (Good to have)
React, Redux, Next.js, Electron.js for dashboards and AI interfaces
Job ID: 133395367