We are seeking an experienced data scientist who apart from the required mathematical and statistical expertise also possesses the natural curiosity and creative mind to ask questions, connect the dots, and uncover opportunities that lie hidden with the ultimate goal of realizing the data s full potential.
Roles & Responsibilities
Developing Modern Data Warehouse solutions using Databricks and AWS/ Azure Stack Ability to provide solutions that are forward-thinking in data engineering and analytics space Collaborate with DW/BI leads to understand new ETL pipeline development requirements
Triage issues to find gaps in existing pipelines and fix the issues Work with business to understand the need in reporting layer and develop data model to fulfill reporting needs Help joiner team members to resolve issues and technical challenges
Drive technical discussion with client architect and team members Orchestrate the data pipelines in scheduler via Airflow
Qualification & Experience
Bachelors and/or masters degree in computer science or equivalent experience
Must have total 6+ yrs of IT experience and 3+ years experience in Data warehouse/ETL projects
Deep understanding of Star and Snowflake dimensional modelling
Strong knowledge of Data Management principles
Good understanding of Databricks Data & AI platform and Databricks Delta Lake Architecture
Should have hands-on experience in SQL, Python and Spark (PySpark)
Candidate must have experience in AWS/ Azure stack
Desirable to have ETL with batch and streaming (Kinesis)
Experience in building ETL / data warehouse transformation processes Experience with Apache Kafka for use with streaming data / event-based data
Experience with other Open-Source big data products Hadoop (incl
Hive, Pig, Impala) Experience with Open Source non-relational / NoSQL data repositories (incl MongoDB, Cassandra, Neo4J)
Experience working with structured and unstructured data including imaging & geospatial data
Experience working in a Dev/Ops environment with tools such as Terraform, CircleCI, GIT
Proficiency in RDBMS, complex SQL, PL/SQL, Unix Shell Scripting, performance tuning and troubleshoot Databricks Certified Data Engineer Associate/Professional Certification (Desirable)
Comfortable working in a dynamic, fast-paced, innovative environment with several ongoing concurrent projects
Should have experience working in Agile methodology
Strong verbal and written communication skills
Strong analytical and problem-solving skills with a high attention to detail
Mandatory Skills: Python/ PySpark / Spark with Azure/ AWS Databricks