SENIOR AI/ML ENGINEER
Experience Required: 5-9 Years
JOB SUMMARY
We are seeking a Senior AI/ML Engineer to lead the research, design, and development of advanced machine learning models and algorithms. This role focuses exclusively on the model development lifecycleincluding architectural design, trainingmethodology, optimization techniques, and algorithmic innovationwhile collaborating with MLOps teams for production deployment.
The incumbent will solve complex business problems through predictive modeling, deep learning, and generative AI research.Responsibilities span from mathematical formulation and prototype development to training large-scale models (1B+ parameters) and rigorous evaluation. This position requires deep expertise in machine learning theory, neural network architectures, and statistical modeling, with emphasis on creating novel solutions rather than infrastructure management.
CORE RESPONSIBILITIES:
- Design and architect neural network models including Transformers, CNNs, RNNs, and hybrid architectures; make decisions on layer configurations, attention mechanisms, activation functions, and connectivity patterns for optimal performance.
- Develop and implement training algorithms and optimization strategies including custom loss functions, learning rate schedules, gradient clipping, and regularization techniques to ensure stable convergence and generalization.
- Fine-tune pre-trained foundation models (LLaMA, Mistral, BERT, GPT, T5) using Parameter-Efficient Fine-Tuning(PEFT) methods including LoRA, QLoRA, Prefix Tuning, and AdaLoRA for domain-specific applications.
- Implement Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI methodologies; design reward models, policy optimization algorithms (PPO, DPO), and human preference learning systems.
- Engineer high-quality training datasets through data collection strategies, cleaning pipelines, augmentation techniques, and synthetic data generation; ensure data representativeness and bias mitigation.
- Design and execute comprehensive model evaluation frameworks including statistical significance testing, cross-validation strategies, benchmark dataset evaluation (MMLU, HumanEval, GLUE, SuperGLUE), and custom metrics development.
- Develop Retrieval-Augmented Generation (RAG) architectures including embedding model selection, retrieval algorithms, context integration strategies, and relevance scoring mechanisms to enhance model accuracy.
- Optimize model architectures for efficiency through knowledge distillation, model pruning, quantization-aware training, and neural architecture search (NAS) without compromising accuracy.
- Perform rigorous statistical analysis and hypothesis testing on model outputs; identify failure modes, error analysis, and edge cases requiring architectural improvements.
- Collaborate with domain experts to translate business requirements into mathematical formulations and ML problem statements; define target variables, feature spaces, and success criteria.
- Mentor junior researchers and engineers on machine learning theory, algorithmic best practices, experimental design, and research methodologies; conduct code reviews for model implementations.
- Document research findings, model architectures, training methodologies, and experimental results in technical reports; publish papers in conferences or journals and present at technical forums.
- Analyze model interpretability and explainability using attention visualization, SHAP values, LIME, and gradient-based attribution methods to ensure transparency in AI decision-making.
ESSENTIAL QUALIFICATIONS & EXPERIENCE:
Educational Qualifications:
- Bachelor's degree (B.E./B.Tech) in Computer Science, Engineering, Mathematics, Statistics, Physics, or related quantitative field from a recognized university.
- Master's degree (M.Tech/MS) or PhD in Machine Learning, Artificial Intelligence, Computer Science, or related field highly desirable; exceptional candidates with Bachelor's degree and significant research experience may be considered.
- Strong foundation in linear algebra, calculus, probability theory, statistics, and optimization theory essential.
Experience Requirements:
- Minimum 5-9 years of research and development experience in applied machine learning, with demonstrable expertise in designing and training neural networks.
- Extensive experience in at least two domains: Natural Language Processing (NLP), Computer Vision, Speech Recognition, Recommendation Systems, or Reinforcement Learning.
- Research Publications: First-author publications in Tier-1 ML conferences (NeurIPS, ICML, ICLR, ACL, CVPR) or journals (JMLR, TPAMI, TACL) (are preferred).
TECHNICAL COMPETENCIES REQUIRED:
Machine Learning Theory & Algorithms:
- Deep theoretical understanding of machine learning algorithms including supervised learning (SVMs, Random Forests, Gradient Boosting), unsupervised learning (clustering, dimensionality reduction, GMMs), and deep learning architectures.
- Expert-level proficiency in PyTorch (strongly preferred) or TensorFlow for implementing custom models, loss functions, and training loops from first principles.
- Advanced knowledge of transformer architectures (BERT, GPT, T5, LLaMA, Mistral) including self- attention mechanisms, positional encodings, layer normalization, and feed-forward networks.
- Mathematical optimization: Gradient descent variants (SGD, Adam, AdamW, LAMB), learning rate scheduling (cosine annealing, warm restarts), second-order methods, and convex/non-convex optimization theory.
- Statistical modeling: Bayesian methods, probabilistic graphical models, hypothesis testing, confidence intervals, and experimental design (A/B testing, factorial designs).
Deep Learning & Generative AI:
- Fine-tuning methodologies: Full fine-tuning, freezing strategies, layer-wise learning rates, discriminative fine-tuning, andParameter-Efficient Fine-Tuning (LoRA, QLoRA, Prefix Tuning, P- tuning, Adapter layers).
- Reinforcement Learning: Policy gradient methods (REINFORCE, A2C, PPO), Q-learning, actor- critic architectures, and RLHF implementation for language models.
- Generative modeling: Understanding of VAEs, GANs, diffusion models, and autoregressive generation techniques.
- Model compression: Knowledge distillation, pruning (structured/unstructured), quantization-aware training, and neural architecture search (NAS).
- Embedding techniques: Word2Vec, GloVe, FastText, contextual embeddings (ELMo, BERT), sentence embeddings (Sentence-BERT, SimCSE), and contrastive learning.
Research & Development Tools:
- Experiment tracking: MLflow, Weights & Biases, or Neptune for logging experiments, hyperparameters, and results.
- Data manipulation: Pandas, NumPy, SciPy, Scikit-Learn for data analysis and classical ML algorithms.
- NLP libraries: Hugging Face Transformers, Tokenizers, Datasets library, SpaCy, NLTK.
- Visualization: Matplotlib, Seaborn, Plotly, TensorBoard for model visualization and performance analysis.
- Version control: Git for code management; DVC for data and model versioning.
- Hardware Awareness: Understanding of GPU/TPU architecture implications for model design (memory constraints, mixedprecision training, model parallelism strategies) without responsibility for infrastructure setup.