
Search by job, company or skills
● A minimum of a BS degree in computer science, software engineering, or related scientific discipline is desired
● Around 8 years of work experience in building scalable and robust data engineering solutions
● Proven experience contributing to complex projects independently
● Demonstrated ownership, and accountability
● Strong understanding of Object Oriented programming and proficiency with programming in Python (TDD) and Pyspark to build scalable algorithms
● 5+ years of experience in distributed computing and big data processing using the Apache Spark framework including Spark optimization techniques
● 4+ years of experience with Databricks, Delta tables, unity catalog, Delta Sharing, Delta live tables(DLT) and incremental data processing
● Experience with Delta lake, Unity Catalog
● Advanced SQL coding and query optimization experience including the ability to write analytical and nested queries
● 5+ years of experience in building scalable ETL/ ELT Data Pipelines on Databricks and AWS (EMR)
● 2+ Experience of orchestrating data pipelines using Apache Airflow/ MWAA
● Understanding and experience of AWS Services that include ADX, EC2, S3
● 5+ years of experience with data modeling techniques for structured/ unstructured datasets
● Experience with relational/columnar databases - Redshift, RDS and interactive querying services - Athena/ Redshift Spectrum
● Passion towards healthcare and improving patient outcomes
● Demonstrate analytical thinking with strong problem solving skills
● Stay on top of emerging technologies and possess willingness to learn
● Experience with Agile environment
● Experience operating in a CI/CD environment
Job ID: 146472139