Search by job, company or skills

SteerLean Consulting

Senior Data Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Responsibilities

  • Design and implement data pipelines, ETL processes, schemas, and data models to ingest, process, and prepare multi-petabyte scale datasets for downstream analytics and machine learning.
  • Build and optimize data processing systems on modern platforms like Spark, Delta Lake, Kafka, etc.
  • Implement data quality, validation, and monitoring measures leveraging tools such as Great Expectations.
  • Ensure compliance with security, access control, and regulatory requirements related to PHI and other sensitive data types.
  • Support adoption of emerging standards like FHIR for healthcare data exchange.
  • Collaborate with data scientists, analysts, and engineers to understand data needs and deliver performant, reliable data products
  • Keep track of emerging technologies & trends in the Data Engineering world, incorporating modern tooling and best practices.

Qualifications

  • Experience in building and operating production big data platforms and pipelines
  • Strong experience with SQL, Spark, workflow orchestrators, distributed message bus, Python, Presto, Deltalake, apache big data tool suites, Docker, Kubernetes, MPP
  • Hands on with the design and implementation of cloud-based data solutions using platforms like Azure, AWS, or GCP, optimizing for scalability, cost-efficiency, and performance.
  • Implement and maintain data lakes and warehouses, lakehouses including data modeling, ETL processes, and data quality assurance to empower data-driven decision-making.
  • Develop real-time data pipelines using streaming technologies like Apache Kafka or AWS Event hub, enabling timely insights and actions from incoming data streams.
  • Manage and enhance distributed data systems (e.g., Hadoop, Spark) to efficiently process large-scale datasets, ensuring data availability and reliability.
  • Previous experience of working on health data and Azure cloud is a strong plus
  • Experience with Databricks or MS Fabric
  • Strong track record of designing and implementing scalable data models, schemas, ETL logic
  • Experience with data governance, master data management, data pseudonimization and anonymization, and data catalog solutions .
  • A strong interest in learning new things and team player ethics.
  • Strong analytical skills and good understanding of data structures and algorithms.
  • Some exposure to Nextflo and or Nextflow Tower

Nice To Have

  • Experience building data pipelines for machine learning.
  • Knowledge of genomics, medical imaging, and/or EHR data domains
  • Knowledge of HIPAA, HL7 and other healthcare data privacy requirements
  • Hands on experience with fully managed data warehousing solutions Azure Synapse, AWS Redshift ,Bigquery, Snowflake etc:
  • Azure Batch & Blob Storage

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 144185527

Similar Jobs