Lead Data Engineer

neurodiscovery ai

Gurugram, Gurugram, India

6-8 Years

Save

Posted 24 days ago
Be among the first 10 applicants

Early Applicant

Job Description

About The Role

We're looking for a seasoned Lead Data Engineer to own and drive our cloud-native

data platform development end-to-end. This is a high-impact, hands-on leadership role

where you'll architect scalable data and database systems, ship production-grade

pipelines, and guide a growing team — all while keeping a sharp eye on business

outcomes.

You'll tackle engineering challenges across distributed systems, large-scale databases,

and multi-cloud data infrastructure. If you thrive at the intersection of deep systems-level

work and cross-functional collaboration, this role is for you.

What You'll Do

Architect & Build: Design, implement, and maintain scalable, production-grade

data platforms across multi-cloud, multi-tenant environments (AWS, Azure,

GCP). Build database and storage solutions that work seamlessly across cloud

providers and diverse deployment models.

Scale Database Systems: Own the design and operation of database

infrastructure supporting a large number of tables, high-throughput operations,

and complex query workloads — scaling through 100x+ growth while maintaining

reliability and performance.

Lead Delivery: Own project timelines, priorities, and stakeholder communication.

Drive data engineering initiatives from ideation through production with a bias for

outcomes over activity.

Set Technical Direction: Define data architecture standards, tooling choices,

and engineering best practices. Make critical build vs. buy decisions for data and

database technologies.

Mentor & Grow the Team: Provide technical mentorship, conduct code reviews,

and help shape a high-performing data engineering culture.

Collaborate Cross-Functionally: Partner closely with product, analytics, ML/AI,

platform, and infrastructure teams to ensure data systems power real business

value.

Operate with Ownership: Monitor data quality, pipeline reliability, and platform

health. Own what you build from design through decommission. Treat production

like a product.

What You Bring Required

6+ years of hands-on experience in cloud-native data engineering, spanning

ingestion, transformation, orchestration, storage, governance, and observability.

Deep expertise in modern distributed systems — you understand consensus,

partitioning, replication, fault tolerance, and have built or operated distributed

data infrastructure at scale.

Scalable database architecture — proven experience designing and managing

database systems with a large number of tables, high-volume OLTP/OLAP

workloads, and complex operational patterns. You've scaled databases through

massive growth at high-growth companies.

1+ years of project management experience — you've owned roadmaps,

managed delivery timelines, coordinated across teams, and are comfortable with

tools like Jira.

Deep expertise in scalable, multi-cloud, multi-tenant data architecture — you

understand the trade-offs and have built systems that serve diverse workloads

across GCP, AWS, Azure, first-party and third-party deployment models.

Strong proficiency in modern data stack technologies such as Spark, Kafka,

Airflow/Dagster, dbt, Snowflake, Databricks, Delta Lake/Iceberg, or

equivalent.

Deep experience with distributed database systems — PostgreSQL, MySQL,

DynamoDB, or similar — including performance tuning, schema design at scale,

and operational reliability.

Proficiency in Python, SQL, and Java/Scala, and at least one infrastructure-as-

code framework (Terraform, Pulumi, etc.).

Experience with data quality, data profiling, data integration, and data

governance — you can engineer solutions that ensure secure and consistent

data consumption across platforms.

A production-first, outcome-oriented mindset — you measure success by

what's running reliably in production, not by what's in a slide deck. Customer

value over story-point velocity.

Excellent communication skills — you can translate complex technical concepts

for both engineering peers and business stakeholders.

Preferred

1+ years of tech/data team management experience — you've directly

managed engineers, run standups, handled performance conversations, and built

team culture.

Experience with the AI-native stack — vector databases (Pinecone, Weaviate,

pgvector), RAG pipelines, feature stores, LLM orchestration frameworks

(LangChain, LlamaIndex), and ML pipeline tooling (MLflow, Kubeflow,

SageMaker).

Background in the healthcare / life sciences domain — familiarity with HL7/

FHIR, HIPAA/GxP compliance, EHR/EMR data, clinical data models, claims/

patient data, health data interoperability standards, or experience processing

large volumes of commercial and medical data.