Job Description
Position Summary
To be a driven business analyst who can work on complex Analytical problems and help the customer in better business decision making especially in the area of pharma/life sciences (domain).
Job Responsibilities
Write Pyspark queries for data transformation needs.
Participates in ETL Design using any python framework of new or changing mappings and workflows with the team and prepares technical specifications
Write complex SQL queries with performance tuning and optimization
Should be able to handle task independently and lead the team if required.
Good communication Skills
Coordinate with cross-functional teams to ensure project objectives are met.
Collaborate with data architects and engineers to design and implement data models.
Education
BE/B.Tech
Master of Computer Application
Work Experience
Advanced knowledge of PySpark ,python, pandas, numpy frameworks.
Minimum 4 years of extensive experience in design, build and deployment ofSpark/Pyspark - for data integration.
Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations
Create Spark jobs for data transformation and aggregation
Spark query tuning and performance optimization - Good understanding of different file formats (ORC, Parquet, AVRO) to optimize queries/processing and compression techniques.
Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus)
Experience in Modular Programming & Robust programming methodologies
ETL knowledge and have done ETL development using any python framework
Worked with Databricks/Snowflake in the past Preferable.
Behavioural Competencies
Ownership
Teamwork & Leadership
Cultural Fit
Motivation to Learn and Grow
Technical Competencies
Problem Solving
Lifescience Knowledge
Communication
Capability Building / Thought Leadership