About The Opportunity
A fast-scaling AI-driven language technology firm operating at the intersection of speech analytics, multilingual NLP, and conversational intelligence, we are building next-gen transcription and localization systems for South Asian languages. Our platform powers real-time dialect-aware transcription for enterprises, governments, and content platforms across India — with a special focus on under-resourced regional languages including Rangpuri, Bhojpuri, Maithili, and others.
Role & Responsibilities
- Accurately transcribe spoken Rangpuri audio into written text, maintaining natural speech patterns, pauses, and tone markers.
- Tag and annotate audio files with speaker IDs, non-speech elements (laughter, hesitation, background noise), and emotional cues for model training.
- Validate and correct AI-generated Rangpuri transcripts for dialectal accuracy, grammar, and contextual relevance.
- Collaborate with linguists and ML engineers to refine transcription guidelines and expand dialectal coverage within the Rangpuri corpus.
- Flag inconsistencies in audio quality and metadata to ensure clean, high-utility datasets for model improvement.
- Adhere to strict confidentiality, data governance, and quality benchmarks while working on sensitive or proprietary recordings.
Skills & Qualifications
Must-Have
- Rangpuri (native or near-native fluency with strong literacy in Devanagari or Bengali script)
- Transcription software experience (e.g., Otter.ai, Descript, Rev, or similar)
- Dialectal awareness (ability to distinguish sub-regional Rangpuri variations)
- Attention to phonetic accuracy for non-standard pronunciation
- Proficiency in MS Excel or Google Sheets for metadata tagging
- Experience with audio annotation or linguistic data cleaning
Preferred
- Basic understanding of linguistic concepts (phonemes, prosody, intonation)
- Experience transcribing for AI/ML training datasets
- Familiarity with Indian regional languages (e.g., Bengali, Assamese, Maithili)
Benefits & Culture Highlights
- 100% remote work with flexible hours — choose your own schedule
- Competitive pay per audio hour with performance bonuses for accuracy and speed
- Be part of building India's first large-scale Rangpuri NLP dataset — your work powers real-world AI inclusion
Skills: data,languages,building,bengali,datasets,transcription,rangpuri,metadata,speech