As a Senior Technical Lead- Python Data Engineering, you will be a part of an Agile team to build healthcare applications and implement new features while adhering to the best coding development standards.
Responsibilities: -
- Design and implement data integration processes (ETL) to facilitate the loading of data into the Staging environment and from Staging to Salesforce.
- Design and implement Data Migration workstream Utilizing a combination of Talend and Salesforce Data Loader to successfully migrate source data from relational sources, TRACS, and file data sources into the Salesforce.
- Develop ETL programs to efficiently transfer data from approximately 200+ source entities into the Staging area.
- Conduct unit testing post-migration, focusing on thorough data verification and validation.
- Enhance and standardize source data for improved usability and consistency.
- Perform validation checks on enriched source data against established rules and constraints.
- Execute data quality initiatives, including data cleanup and de-duplication, while defining business rules, calculations, processes, metrics,and value hierarchies to automate these tasks.
- Create matching groups and define matching expressions to identify and consolidate duplicate records.
- Consolidate records from various duplicates within a group using defined row or record-level consolidation rules.
- Carry out post-consolidation enrichment and validation activities to ensure data integrity.
- Validate the data following its migration to Master Data Management, focusing on optimization and required automation for improved efficiency.
- Technical experience on various data corrections and advance knowledge in handling data kickouts and conversions
- Mentor junior team members and help to bring new team members onto the team Other similar or related activities.
Experience: -
Educational Qualifications: -
- Engineering Degree BE / ME / BTech / M Tech / B.Sc. / M.Sc.
Technical certification in multiple technologies is desirable.
Skills: -
Mandatory Technical skills
- Azure - Aware of Azure and storage.
- Python - Modular/Framework level programming experience by using OOPS and good backend programming experience and aware of JSON file handling.
- SQL - Advanced SQL, SQL Performance Tunning, SQL Data Issue fixes.
- Data warehousing - Good experience in handling and exposed to such use cases.
- Data Modelling - Good experience in handling and exposed to such use cases.
- Pyspark - Aware of pyspark
- Programming Languages: (Object-Oriented Programming, REST API integration including OAuth, pagination, and structured data handling)
- SQL Expertise:
- Advanced SQL queries
- Query optimization
- JSON manipulation
- Stored procedures
- Data Engineering:
- End-to-End pipeline development
- Upstream and downstream data integration
- Data transformation and loading
- ETL Tools: Python-based ETL solutions (experience with orchestration tools such as Prefect and Airflow is a plus)
- Cloud Platforms: Azure and AWS experienced in developing and deploying containerized Python applications via Container Apps and managed Kubernetes services
Good to have skills: -
- Databases: Snowflake, SQL Server, Azure SQL
- ETL Tools: Python-based ETL, Azure Data Factory
- Orchestration: Prefect (preferred), Airflow
- AI/ML Integration:
- Connecting ML inference via REST APIs
- Feature engineering and dynamic variable creation
- Analytics Stack:Experience working with dbt Core and dbt Cloud