About The Role
We're looking for a seasoned Lead Data Engineer to own and drive our cloud-native
data platform development end-to-end. This is a high-impact, hands-on leadership role
where you'll architect scalable data and database systems, ship production-grade
pipelines, and guide a growing team — all while keeping a sharp eye on business
outcomes.
You'll tackle engineering challenges across distributed systems, large-scale databases,
and multi-cloud data infrastructure. If you thrive at the intersection of deep systems-level
work and cross-functional collaboration, this role is for you.
What You'll Do
- Architect & Build: Design, implement, and maintain scalable, production-grade
data platforms across multi-cloud, multi-tenant environments (AWS, Azure,
GCP). Build database and storage solutions that work seamlessly across cloud
providers and diverse deployment models.
- Scale Database Systems: Own the design and operation of database
infrastructure supporting a large number of tables, high-throughput operations,
and complex query workloads — scaling through 100x+ growth while maintaining
reliability and performance.
- Lead Delivery: Own project timelines, priorities, and stakeholder communication.
Drive data engineering initiatives from ideation through production with a bias for
outcomes over activity.
- Set Technical Direction: Define data architecture standards, tooling choices,
and engineering best practices. Make critical build vs. buy decisions for data and
database technologies.
- Mentor & Grow the Team: Provide technical mentorship, conduct code reviews,
and help shape a high-performing data engineering culture.
- Collaborate Cross-Functionally: Partner closely with product, analytics, ML/AI,
platform, and infrastructure teams to ensure data systems power real business
value.
- Operate with Ownership: Monitor data quality, pipeline reliability, and platform
health. Own what you build from design through decommission. Treat production
like a product.
What You Bring Required
- 6+ years of hands-on experience in cloud-native data engineering, spanning
ingestion, transformation, orchestration, storage, governance, and observability.
- Deep expertise in modern distributed systems — you understand consensus,
partitioning, replication, fault tolerance, and have built or operated distributed
data infrastructure at scale.
- Scalable database architecture — proven experience designing and managing
database systems with a large number of tables, high-volume OLTP/OLAP
workloads, and complex operational patterns. You've scaled databases through
massive growth at high-growth companies.
- 1+ years of project management experience — you've owned roadmaps,
managed delivery timelines, coordinated across teams, and are comfortable with
tools like Jira.
- Deep expertise in scalable, multi-cloud, multi-tenant data architecture — you
understand the trade-offs and have built systems that serve diverse workloads
across GCP, AWS, Azure, first-party and third-party deployment models.
- Strong proficiency in modern data stack technologies such as Spark, Kafka,
Airflow/Dagster, dbt, Snowflake, Databricks, Delta Lake/Iceberg, or
equivalent.
- Deep experience with distributed database systems — PostgreSQL, MySQL,
DynamoDB, or similar — including performance tuning, schema design at scale,
and operational reliability.
- Proficiency in Python, SQL, and Java/Scala, and at least one infrastructure-as-
code framework (Terraform, Pulumi, etc.).
- Experience with data quality, data profiling, data integration, and data
governance — you can engineer solutions that ensure secure and consistent
data consumption across platforms.
- A production-first, outcome-oriented mindset — you measure success by
what's running reliably in production, not by what's in a slide deck. Customer
value over story-point velocity.
- Excellent communication skills — you can translate complex technical concepts
for both engineering peers and business stakeholders.
Preferred
- 1+ years of tech/data team management experience — you've directly
managed engineers, run standups, handled performance conversations, and built
team culture.
- Experience with the AI-native stack — vector databases (Pinecone, Weaviate,
pgvector), RAG pipelines, feature stores, LLM orchestration frameworks
(LangChain, LlamaIndex), and ML pipeline tooling (MLflow, Kubeflow,
SageMaker).
- Background in the healthcare / life sciences domain — familiarity with HL7/
FHIR, HIPAA/GxP compliance, EHR/EMR data, clinical data models, claims/
patient data, health data interoperability standards, or experience processing
large volumes of commercial and medical data.
- Experience with Redis, Temporal, async job processing frameworks, or other
infrastructure supporting high-throughput distributed workloads.
- Experience with real-time/streaming architectures (Kafka Streams, Flink,
Spark Structured Streaming).
- Track record of building multi-cloud or hybrid cloud database solutions.
- Experience with database orchestration and automation at scale.
- Familiarity with data mesh or data product paradigms.
- Strong testing discipline — experience creating comprehensive automated unit
and integration tests for data systems.