Search by job, company or skills

neurodiscovery ai

Lead Data Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted 24 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About The Role

We're looking for a seasoned Lead Data Engineer to own and drive our cloud-native

data platform development end-to-end. This is a high-impact, hands-on leadership role

where you'll architect scalable data and database systems, ship production-grade

pipelines, and guide a growing team — all while keeping a sharp eye on business

outcomes.

You'll tackle engineering challenges across distributed systems, large-scale databases,

and multi-cloud data infrastructure. If you thrive at the intersection of deep systems-level

work and cross-functional collaboration, this role is for you.

What You'll Do

  • Architect & Build: Design, implement, and maintain scalable, production-grade

data platforms across multi-cloud, multi-tenant environments (AWS, Azure,

GCP). Build database and storage solutions that work seamlessly across cloud

providers and diverse deployment models.

  • Scale Database Systems: Own the design and operation of database

infrastructure supporting a large number of tables, high-throughput operations,

and complex query workloads — scaling through 100x+ growth while maintaining

reliability and performance.

  • Lead Delivery: Own project timelines, priorities, and stakeholder communication.

Drive data engineering initiatives from ideation through production with a bias for

outcomes over activity.

  • Set Technical Direction: Define data architecture standards, tooling choices,

and engineering best practices. Make critical build vs. buy decisions for data and

database technologies.

  • Mentor & Grow the Team: Provide technical mentorship, conduct code reviews,

and help shape a high-performing data engineering culture.

  • Collaborate Cross-Functionally: Partner closely with product, analytics, ML/AI,

platform, and infrastructure teams to ensure data systems power real business

value.

  • Operate with Ownership: Monitor data quality, pipeline reliability, and platform

health. Own what you build from design through decommission. Treat production

like a product.

What You Bring Required

  • 6+ years of hands-on experience in cloud-native data engineering, spanning

ingestion, transformation, orchestration, storage, governance, and observability.

  • Deep expertise in modern distributed systems — you understand consensus,

partitioning, replication, fault tolerance, and have built or operated distributed

data infrastructure at scale.

  • Scalable database architecture — proven experience designing and managing

database systems with a large number of tables, high-volume OLTP/OLAP

workloads, and complex operational patterns. You've scaled databases through

massive growth at high-growth companies.

  • 1+ years of project management experience — you've owned roadmaps,

managed delivery timelines, coordinated across teams, and are comfortable with

tools like Jira.

  • Deep expertise in scalable, multi-cloud, multi-tenant data architecture — you

understand the trade-offs and have built systems that serve diverse workloads

across GCP, AWS, Azure, first-party and third-party deployment models.

  • Strong proficiency in modern data stack technologies such as Spark, Kafka,

Airflow/Dagster, dbt, Snowflake, Databricks, Delta Lake/Iceberg, or

equivalent.

  • Deep experience with distributed database systems — PostgreSQL, MySQL,

DynamoDB, or similar — including performance tuning, schema design at scale,

and operational reliability.

  • Proficiency in Python, SQL, and Java/Scala, and at least one infrastructure-as-

code framework (Terraform, Pulumi, etc.).

  • Experience with data quality, data profiling, data integration, and data

governance — you can engineer solutions that ensure secure and consistent

data consumption across platforms.

  • A production-first, outcome-oriented mindset — you measure success by

what's running reliably in production, not by what's in a slide deck. Customer

value over story-point velocity.

  • Excellent communication skills — you can translate complex technical concepts

for both engineering peers and business stakeholders.

Preferred

  • 1+ years of tech/data team management experience — you've directly

managed engineers, run standups, handled performance conversations, and built

team culture.

  • Experience with the AI-native stack — vector databases (Pinecone, Weaviate,

pgvector), RAG pipelines, feature stores, LLM orchestration frameworks

(LangChain, LlamaIndex), and ML pipeline tooling (MLflow, Kubeflow,

SageMaker).

  • Background in the healthcare / life sciences domain — familiarity with HL7/

FHIR, HIPAA/GxP compliance, EHR/EMR data, clinical data models, claims/

patient data, health data interoperability standards, or experience processing

large volumes of commercial and medical data.

  • Experience with Redis, Temporal, async job processing frameworks, or other

infrastructure supporting high-throughput distributed workloads.

  • Experience with real-time/streaming architectures (Kafka Streams, Flink,

Spark Structured Streaming).

  • Track record of building multi-cloud or hybrid cloud database solutions.
  • Experience with database orchestration and automation at scale.
  • Familiarity with data mesh or data product paradigms.
  • Strong testing discipline — experience creating comprehensive automated unit

and integration tests for data systems.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 145940213

Similar Jobs

Gurugram, Gurugram, India

Skills:

Rest ApisBig Data TechnologiesSqlAzure Data FactoryAzure Data LakeDatabricksPythonCloud-based data ecosystemManufacturing Resource Planning SystemsMedallion ArchitectureRPA automation toolsUnity Catalogdata lineage toolsdata quality toolsdata integration toolsdata analytics and business intelligence toolsSAP ModulesPower BI Suitedata warehousing technologies

Noida, India

Skills:

Shell scriptingPython

Gurugram, Gurugram, India

Skills:

Data ModelingPysparkScalaKafkaData ExtractionSqlData QualityAzure MLAzure Data FactorySqoopPythonEtlAzure DevOpsData PipelinesAirbyteGCP Cloud ComposerdbtVertex AIDelta LakeGCP BigQueryGCP DLPGCP Cloud RunFivetran

Noida, India

Skills:

data engineering SqlAWSPythonJenkinsData LakeGitPysparkAirflow

Noida, India

Skills:

data engineering JenkinsSqlAWSPythonGitPysparkAirflowData Lakes