Senior Data Engineer

Go Digital Technology Consulting LLP

Mumbai, India

4-8 Years

This job is no longer accepting applications

Posted 7 months ago

Job Description

Location: Mumbai

Experience: 4yrsto 8yrs

Technologies / Skills: Advanced SQL, Python and associated librarieslike Pandas, Numpy etc., Pyspark , Shell scripting, DataModelling, Big data, Hadoop, Hive, ETL pipelines.

Responsibilities:

Proven successin communicating with users, other technical teams, and senior management to collect requirements, describe data modeling decisions and develop data engineering strategy.

Ability to work with business ownersto define key businessrequirements and convert to user stories with required technical specifications.

Communicate results and businessimpacts of insight initiatives to key stakeholders to collaboratively solve business problems.

Working closely with the overall Enterprise Data & Analytics Architect and Engineering practice leads to ensure adherence with the best practices and design principles.

Assures quality, security and compliance requirements are met forsupported area.

Design and create fault-tolerance data pipelinesrunning on cluster

Excellent communication skills with the ability to influence client business and IT teams

Should have design data engineering solutions end to end. Ability to come up with scalable and modular solutions

Required Qualification:

3+ years of hands-on experience Designing and developing Data Pipelinesfor Data Ingestion or Transformation using Python (PySpark)/Spark SQL in AWS cloud

Experience in design and development of data pipelines and processing of data at scale.

Advanced experience in writing and optimizing efficient SQL queries with Python and Hive handling Large Data Sets in Big-Data Environments

Experience in debugging, tunning and optimizing PySpark data pipelines

Should have implemented concepts and have good knowledge of Pyspark data frames, joins, caching, memory management, partitioning, parallelism etc.

Understanding of Spark UI, Event Timelines, DAG, Spark config parameters, in order to tune the long running data pipelines.

Experience working in Agile implementations

Experience with building data pipelinesin streaming and batch mode.

Experience with Git and CI/CD pipelines to deploy cloud applications

Good knowledge of designing Hive tables with partitioning for performance.

Desired Qualification:

Experience in data modelling

Hands on creating workflows on any Scheduling Tool like Autosys, CA Workload Automation