Role Overview
We are looking for an EHR Clinical Data Integration Engineer who can bridge the gap between external stakeholder management and internal production engineering. You will own clinical data ingestion end-to-end, serving as the primary technical liaison to health systems to establish
feeds, while simultaneously building the robust, production-grade data pipelines that power clinical platforms.
This role sits at the intersection of integration leadership, data engineering, and production readiness. You will drive the timeline for getting data in, and own the architecture that keeps that data reliable.
What Youll Do- Integration Leadership & Sourcing (Outward Facing)
- Lead Technical Engagement: Act as the primary technical liaison for external health system stakeholders. You will lead discussions to define data specs, reconcile gaps, and manage the integration lifecycle from kickoff to production.
- Establish Feeds: configure and secure clinical data feeds from Epic, Athena health, and partner systems using patterns like FHIR APIs, HL7v2 feeds, SFTP/file exchange, or partner specific mechanisms.
- Scope & Validate: Validate client-ready data requests and work with care teams to ensure requirements are comprehensive before technical implementation begins.
- Data Engineering & Platform Integration (Internal Facing)
- Build Scalable Pipelines: Design and maintain production-grade ETL/ELT services using Python and Node.js. You will implement data engineering best practices (e.g., Medallion architecture concepts) to ensure data is organized, accessible, and performant.
- Orchestration & Automation: Utilize orchestration tools (e.g., Airflow, Dagster) or data platforms to automate workflows, ensuring data freshness and reliability.
- Platform Integration: Integrate pipelines with the clinical platform, ensuring secure handling of PHI, consistent data contracts, and seamless downstream consumption.
- Data Quality & Reliability
- Validation Frameworks: Implement robust validation frameworks and QC checks (completeness, conformance, plausibility reconciliation/back testing) to catch issues before they reach production.
- Root Cause Analysis: Create repeatable workflows for debugging data issues across source systems, transformations, and downstream consumers.
- Observability: Define and monitor SLAs/SLOs for pipeline health using AWS CloudWatch (logs, metrics, alarms), ensuring high availability and rapid incident response.
Requirements
Must Have
- 3 to 4 years of experience in data engineering, backend engineering, or integration engineering
- Healthcare Data Expertise: Direct experience establishing clinical data feeds from Epic, Athena health, Cerner, or eClinicalWorks, with a deep understanding of data standards (FHIR, HL7v2, CCD/C-CDA).
- Strong Data Engineering Fundamentals: Experience building and maintaining scalable data pipelines using modern orchestration tools (Airflow, Dagster) or data platforms (Data bricks, dbt). Candidates should understand concepts like medallion architecture.
- Strong Programming Skills:
- Python: For ETL, validation tooling, and backend services.
- Node.js/JavaScript: For building integration services and APIs.
- SQL: For complex querying, profiling, and reconciliation.
- Data Quality Mindset: Proven track record of building automated QC checks and validation logic that prevents bad data from entering production.
- Stakeholder Management: Experience acting as a technical lead or liaison, managing timelines and expectations with external partners or clients.
- AWS Production Operations: Comfortable operating workloads in AWS (IAM, networking) and using CloudWatch for observability (logs, metrics, dashboards).
Nice to Have
- Advanced Observability: Experience with Grafana, Datadog, Prometheus, or OpenTelemetry.
- Distributed Processing: Experience with PySpark or high-volume ingestion patterns.
- Infrastructure as Code: Exposure to Terraform, CloudFormation, or similar tools.
- Containerization: Experience with Docker, Kubernetes (EKS), and CI/CD (GitHub Actions).
- AI/LLM-Assisted Automation: Interest or experience in using agentic workflows (e.g., LangChain, AutoGPT, or custom LLM scripts) to automate data mapping, schema reconciliation, or documentation tasks.
What Success Looks Like (First 60 to 90 Days)
- Map the Architecture: Establish a clear picture of. clinical data flows, data contracts, and integration architecture.
- Ship Reliability: Implement at least one measurable improvement to data quality (e.g., better QC checks, automated back testing, or reduced incident volume).
- Improve Operations: Reduce time-to-detect/time-to-resolve for data issues via better CloudWatch dashboards and Run books.
- Streamline Onboarding: Improve the onboarding process for new health system integrations with clearer technical specifications and reusable integration patterns.