Description
We're seeking a highly skilled, hands-on Data Scientist with 410 years of experience in applied AI/ML to join our fast-paced team. This role requires deep expertise in transformer architectures and strong fundamentals in model training, fine-tuning, and optimization. You'll work across modalities (text, audio, video), with the flexibility to specialize in one domain but the adaptability to experiment across others.
The ideal candidate thrives in a startup-style, high-velocity R&D environment, is execution-focused, and demonstrates ownership from architecture to deployment. You'll run rapid experiments, iterate on state-of-the-art models, and push the boundaries of generative AI in lip-sync, character consistency, audio realism, and video quality with a research-first, problem-solving mindset.
Responsibilities
- Model Development & Fine-tuning: Run end-to-end experiments on transformer-based architectures (LLMs, Whisper, diffusion, LoRA, RLHF/SFT, multimodal models).
- Domain-Specific Applications:
- Audio: Lip-sync, emotional delivery (shouting, whispering, crying), regional language support.
- Video: Scene/character consistency, quality benchmarks comparable to Veo3/Sora.
- Text: Extend LLMs to handle regional languages and domain-specific adaptation.
- Evaluation & Optimization: Design automated evaluation frameworks for objective quality scoring (images, video frames, audio clips). Balance trade-offs in speed, quality, and efficiency.
- Cross-Modality Integration: Experiment with audio-video synchronization, background score integration, and text-to-video alignment.
- Research & Experimentation: Stay ahead of rapidly evolving models and tools, testing architectural variations and scaling solutions for production use.
- Ownership & Execution: Drive initiatives independently with strong problem-solving, accountability, and first-principles thinking.
Requirements
- Experience: 410 years in applied Data Science/ML with a strong focus on generative AI.
- Core Fundamentals: Solid grasp of transformer architectures, LLMs, training dynamics, and optimization techniques.
- Modality Depth: Expertise in at least one modality (text, audio, or video), with demonstrable end-to-end project experience.
- Hands-On Skills: Strong coding and debugging ability in Python, with deep learning frameworks (PyTorch, TensorFlow).
- Deployment Knowledge: Experience with ML pipelines (FastAPI or similar) for inference and deployment.
- Evaluation Metrics: Proven ability to design/implement automated evaluation methods for generative outputs.
- Adaptability: Ability to experiment quickly with new tools, libraries, and models in a dynamic environment.
Benefits
What you get
- Best in class salary: We hire only the best, and we pay accordingly.
- Proximity Talks: Meet other designers, engineers, and product geeks and learn from experts in the field.
- Keep on learning with a world-class team: Work with the best in the field, challenge yourself constantly, and learn something new every day.