Senior Data & Infrastructure Engineer

neuranx.ai

Chennai, India

3-5 Years

Save

Posted 3 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Company Description

NeuraNx.ai is a cutting-edge technology company specializing in developing AI-powered products and advanced cloud solutions for innovative businesses. The organization focuses on delivering intelligent multi-agent systems, RAG architectures, and cloud infrastructures on platforms like Azure, AWS, and GCP. With extensive expertise in AI product development, intelligent automation, and enterprise AI consulting, NeuraNx.ai empowers businesses to accelerate processes, enhance automation, and scale efficiently. Built by experienced practitioners with a proven track record across industries like automotive, aviation, finance, and energy, NeuraNx.ai is dedicated to making intelligent technology accessible and impactful. Join us to shape the future of AI and cloud technology together.

About the role

You own the data that determines the accuracy ceiling of every ML component plus the annotation pipeline, synthetic data generation, and the correction feedback loop. This role also backstops infrastructure where the NLP & Platform Engineer needs support. You are the most cross-functional role on the team — data engineer, annotation lead, and infrastructure generalist.

What you'll do

Data & annotation (60%):

Source legal audio (SCOTUS 3K hours, C-SPAN, client recordings)
Build ingestion pipeline (audio normalization, transcript alignment, train/dev/test splitting)
Set up annotation tooling (Label Studio or similar) and manage 2-3 contract annotators
Curate the evaluation test set (20-50 segments with certified court reporter ground truth)
Build legal text corpus (500M+ words) from CourtListener, briefs, statutes
Implement TTS-based augmentation and speed/pitch perturbation
Build correction feedback loop (human edits → diff → fine-tuning data)

Infrastructure & platform support (40%):

Help maintain Docker, CI/CD, and deployment scripts
Set up model registry and experiment tracking (MLflow or similar)
Build data quality dashboards and metrics reporting
Backstop the NLP & Platform Engineer on infrastructure tasks during crunches
Support GPU server setup and configuration

What you bring

3+ years in data engineering, ML data pipelines, or similar
Experience building annotation pipelines and managing annotation quality
Strong Python data processing (pandas, audio libraries, text normalization)
Web scraping and data cleaning at scale
Comfortable with audio data (format conversion, segmentation, alignment)
Enough infrastructure knowledge to be dangerous (Docker, basic cloud/on-prem concepts)
Exceptionally organized — you manage the team's most valuable asset (data)

Nice to have