Search by job, company or skills

The Factual holding co

Artificial Intelligence Engineer

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted a month ago

Job Description

Founding AI Engineer (Video x Multimodal Models) AuraFarming

Location:Remote (India only)

Type:Full-time, Founding Team

About Us

We're buildingAuraFarming, the Cursor for video creation.

A next-generationAI video platformwhere users can upload any video, describe what changes they want, and our system will automatically recreate or clone it with new visuals, new products, and new music.

ThinkSora + Veo + Gemini + Runwaycombined into one integrated creative IDE.

Our goal is to build the unified layer between video understanding and video generation.

The Role

We're hiring aFounding AI Engineerto lead our multimodal intelligence stack.

You'll design the pipeline that lets our platform watch, understand, and recreate videos.

Your work will define the heart of the system: transforming raw media into structured scene representations and turning them into generative prompts for top-tier models like Sora, Veo, and Runway.

This is a zero-to-one role: build, fine-tune, and iterate. You'll be working directly with the founders and the founding full-stack engineer.

Your Mission

Design thevideo understanding engine: detect scenes, subjects, motion, music, voice, and text.

Build theprompt compilerthat converts user edits into model-ready JSON instructions.

IntegrateGemini 1.5 Pro,GPT-4o, andLLaVA-Videofor multimodal reasoning.

Connect with video generation APIs:Sora, Veo, Runway, Higgsfield, WAN 2.5.

Prototypevideo-to-video delta generation(upload + describe regenerate).

Collaborate on backend integration and optimization for latency and cost.

Own R&D for diffusion-based and transformer-based video models (AnimateDiff, I2VGen-XL, VideoCrafter2).

Core Competencies

Model Integration

Experience with multimodal APIs (Gemini, GPT-4o, Claude, LLaVA).

Knowledge of diffusion pipelines (AnimateDiff, Stable Video Diffusion).

Ability to call and orchestrate video generation endpoints (Sora, Veo, Runway).

Computer Vision / Audio

Familiar with ffmpeg, frame extraction, CLIP embeddings, Whisper transcription.

Understands temporal modeling and scene segmentation.

Experience with image/video captioning, visual grounding, or action recognition.

Prompt Engineering & Reasoning

Design structured prompt schemas for multimodal models.

Ability to parse user deltas into JSON commands.

Experience fine-tuning or prompting LLMs for structured output.

Programming Stack

Python, PyTorch, FastAPI, Celery, ffmpeg, PostgreSQL.

Working familiarity with OpenAI, Google AI, Replicate, or RunPod APIs.

Mindset

Thinks in systems: how to turn raw media into data, not demos.

Ships fast and iterates with founders.

Wants to invent the next layer of creative AI, not just use existing APIs.

Target Tech Stack

Core Models:Gemini 1.5 Pro, GPT-4o, LLaVA-Video, AnimateDiff, Veo, Sora

Frameworks:PyTorch, FastAPI, Celery, ffmpeg

Infrastructure:RunPod, Modal, AWS, R2 Storage

Integrations:ElevenLabs (voice), Suno/Mubert (music), Stripe (credits)

Why Join AuraFarming

Ground floor founding position with deep product ownership.

Direct involvement in cutting-edge multimodal video systems.

Work with founders shipping real generative products, not research demos.

Equity upside and full creative control over AI direction.

Fast execution culture: idea to prototype in days, not months.

Compensation

Competitive salary (India benchmark)

6L-12L/yr

Founder-level equity allocation

Performance-linked upside

Who You Are

2-4+ years experience in AI/ML or computer vision

1+ years working with multimodal or diffusion models

Strong Python + PyTorch background

Experience shipping production-grade model pipelines

Self-sufficient, execution-first, comfortable building fast

How to Apply

Send your GitHub, LinkedIn, and a short note on:

  1. A generative or multimodal project you shipped.
  2. Your experience integrating or fine-tuning video/image models.

Apply here:https://tally.so/r/w89RjY

More Info

Job Type:
Industry:
Function:
Employment Type:

Job ID: 128872073

Similar Jobs