Search by job, company or skills

Providence India

Lead Data Engineer

4-7 Years
new job description bg glownew job description bg glownew job description bg svg
  • Posted 22 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

About the RoleWe are looking for a handson Data Engineer to design, build, and operate robust data pipelines and platforms on Snowflake with Azure. You will usestrong SQL,Python/PySpark,ADF pipelines, and modern datamodeling practices to ingest data from diverse data sources, and enable AI/ML usecases viaVectorDB indexing and embeddings. The role emphasizes reliability, performance, costefficiency, and secure data operations in line with our enterprise platforms and standards.Key ResponsibilitiesDesign & build data pipelineson Snowflake and Azure (ADF, PySpark) to ingest data from REST APIs, files, and databases into curated zones.Model dataoptimized for analytics, reporting, and downstream applications.Develop embeddings & VectorDB indicesto power semantic search/retrieval (e.g., generating embeddings and indexing into enterpriseapproved vector stores integrate with pipeline orchestration).Own performance & cost optimizationin Snowflake (SQL tuning, partitioning, caching, clustering, compute sizing).Implement CI/CD and DevOpspractices (Git branching, automated deploys for ADF/Snowflake).Harden reliability(monitoring, alerting, retry logic, SLA tracking) andsecurity/compliance(RBAC, secrets management, data governance, data lineage).Collaborate with stakeholders(product, analytics, and platform teams) to translate requirements into technical design and deliver incremental value.MustHave Qualifications4-7 yearstotal experience in data engineering in large scale enterprise systemsSnowflake: Min 3 years of experience in Snowflake with exposure to warehouse configuration, schema design, performance tuning stored procedures/tasks loading strategies. Exposure to Snowflake Cortex AI.SQL/Python/PySpark: Design and implement scalable data processing solutions using SQL, Python, and distributed compute frameworks, including unit/integration tests.Azure & ADF: ADLS Gen2, ADF pipelines/activities, triggers, parameterization monitoring & troubleshooting.Data modeling: Apply data modeling techniques, including medallion architecture (Bronze/Silver/Gold).API ingestion: designing resilient ingestion of REST/JSON, pagination, auth, ratelimit handling.VectorDB & embeddings: Experience generating embeddings and building vector indices for retrievalaugmented scenariosExposure to building knowledge graphs and Gremlin or Cypher graph query languages on CosmosDB/Neo4jVersion control & CI/CD: Git, pull requests, automated deployment pipelines.Maintain a results-oriented mindset with strong analytical and problem-solving skills.GoodtoHaveExperience in Healthcare IndustryPrior experience working on data migration projects.

More Info

About Company

Providence, one of the US's largest not-for-profit healthcare systems, is committed to high quality, compassionate healthcare for all. Driven by the belief that health is a human right and the vision, ‘Health for a better world', Providence and its 121,000 caregivers strive to provide everyone access to affordable quality care and services.

Job ID: 137003201

Similar Jobs