Experience: 4+ Years
Location: Bengaluru, Chennai (Hybrid)
Team: Data Science & AI
About Akaike Technologies
At Akaike Technologies, we are redefining the boundaries of enterprise intelligence. We are seeking a highly specialized Senior Data Scientist who thrives at the intersection of
Generative AI and
Classical Machine Learning.
This role is designed for a practitioner who does not just call APIs but understands the mathematics behind Transformers and can architect complex, high-accuracy Agentic systems. You will spend roughly
60% of your time on Generative AI (Agents, RAG, SQL-Gen) and
40% on robust Classical ML/Deep Learning (Forecasting, Classification, Custom Architectures), all backed by a strong PySpark data foundation.
Key Responsibilities
- Generative AI & Agentic Systems (60% Focus)
SQL-Based Agent Architecting: Design and deploy highly accurate
Text-to-SQL agents that can query complex enterprise databases with precision. Focus on schema linking, error handling, and self-correction mechanisms.
Multi-Agent Systems: Build sophisticated
Agentic Workflows using patterns like
ReACT and
Agent-Critique. Orchestrate systems where agents collaborate (using frameworks like LangGraph or CrewAI) to critique and improve each other's outputs before final execution.
RAG & Long-Context Optimization: Develop production-grade Retrieval Augmented Generation (RAG) systems. Optimize chunking strategies, vector search (Pinecone/Milvus/Weaviate), and re-ranking algorithms to minimize hallucinations.
LLM Evaluation & Fine-Tuning: Move beyond basic prompting. Implement
LLM-as-a-judge evaluation frameworks to quantitatively measure agent accuracy. Perform Parameter-Efficient Fine-Tuning (PEFT/LoRA) on open-source models (Llama 3, Mistral) for domain-specific tasks.
- Classical ML, Deep Learning & Transformer Governance (40% Focus)
Transformer Internals: Demonstrate deep governance over Transformer architectures. Go beyond pre-trained models to design
custom loss functions or modify attention mechanisms to address specific data nuances.
Custom Business Modeling: Build bespoke predictive models for complex business scenarios such as
Targeting,
Budget Optimization, and
Churn, where off-the-shelf solutions fail.
Advanced Deep Learning: Utilize 1D/2D CNNs, LSTMs, and Representation Learning for complex pattern recognition in non-text data (time-series, behavioral logs).
Sparsity & Nuance: Handle real-world data challenges, including PU learning (Positive-Unlabeled), single-class learning.
- Data Science at Scale (PySpark & Databricks)
Billion-Scale Processing: You are not reliant on Data Engineers for every table. You must comfortably write optimized
PySpark/SparkSQL jobs on
Databricks to process billions of rows for training data creation.
Feature Engineering: Build complex feature stores in a distributed environment, ensuring consistency between training and inference.
- Architecture, MLOps & Lifecycle Management
System Architecture Design: Architect end-to-end ML systems, making critical trade-off decisions between latency, cost, and accuracy. Design modular components for reusability and scalability across the organization.
A/B Testing & Measurement: Design and execute rigorous
A/B tests (or Interleaved testing) to validate model impact in production. Define clear success metrics (offline proxies vs. online business KPIs) and ensure statistical significance of results.
Continuous Improvement (CI/CD/CT): Establish feedback loops for model monitoring. Detect
data drift and
concept drift, and implement automated retraining strategies to ensure models improve continuously over time.
Serverless Pipelines: Design scalable deployment pipelines utilizing
AWS Lambda,
Step Functions, and
FastAPI for event-driven and real-time inference.
- Leadership & Stakeholder Management
Strategic Problem Formulation: Proactively identify opportunities to leverage data science by analyzing product roadmaps and market scenarios. Translate abstract business goals (e.g., maximize user engagement or reduce market spend) into concrete, solvable mathematical problems.
Technical Mentorship: Actively mentor junior data scientists. Conduct rigorous code reviews, enforce design patterns, and foster a culture of engineering excellence within the team.
Client Handling: Serve as the primary technical point of contact for clients. Explain model limitations transparently to non-technical stakeholders and manage expectations regarding AI capabilities.
Must-Have Skills
Core Technical Stack:
GenAI Frameworks: Advanced proficiency with LangChain, LlamaIndex, or DSPy. Experience building agents that interact with SQL databases is critical.
Deep Learning: PyTorch or TensorFlow. Deep understanding of Attention mechanisms, Encoder-Decoder architectures, and Embeddings.
Big Data: Expert-level
PySpark and
SQL. Ability to debug Spark jobs and optimize partitions/shuffles on Databricks.
Programming: Python (OOP, typing, rigorous code standards).
Experience & Soft Skills
Proven track record of deploying at least one
Agentic System or complex
RAG pipeline to production.
Experience treating
SQL as a first-class citizen in GenAI workflows (Text-to-SQL).
Experimental Mindset: Strong grasp of statistical testing, experimental design, and metrics evaluation (A/B testing).
Strong Communication: Ability to articulate complex technical concepts to business leaders without oversimplifying the risks.
Nice to Have
Experience with
vLLM or
TGI for serving open-source models.
Knowledge of Knowledge Graphs (Neo4j) combined with LLMs (GraphRAG).
Publications or active contributions to the Open Source AI community.
Benefits & Perks
Competitive Compensation & ESOPs.
Budget for compute (GPUs) for experimentation.
Sponsorship for top-tier AI conferences (NeurIPS, ICML, etc.).
A culture that values Science in Data Sciencewe encourage reading papers and trying novel architecture.