Data Engineer - AWS/PySpark

Ingrain Systems Inc

Hyderabad, India

3-5 Years

Save

Posted 15 days ago
Be among the first 20 applicants

Early Applicant

Job Description

Job Title : Data engineer

Location : Hyderabad

Work mode : Hybrid

Office Timings : 11am to 8pm IST

Rounds of interview - 2( F2F)

Core Technical Expertise Required For This Role

AWS PySpark : Strong hands-on experience using PySpark within AWS environments to process large-scale datasets efficiently.
AWS Glue : Experience building, maintaining, and optimizing AWS Glue jobs for data extraction, transformation, and loading.
AWS S3 : Proficient in working with Amazon S3 for data storage, data lake architecture, and integration with analytics pipelines.
PySpark : Ability to write optimized PySpark code for distributed data processing and transformation.
ETL Frameworks : Experience designing, developing, and maintaining scalable ETL frameworks for batch and streaming data pipelines.

Skills that will provide additional value to the role :

Knowledge on Talend Cloud ETL : Familiarity with Talend Cloud for building and orchestrating ETL pipelines.
Kafka : Understanding of event-driven architectures and streaming data platforms.
Snowflake Cloud : Experience working with Snowflake for cloud-based data warehousing and analytics.
PowerBI : Exposure to data visualization and reporting using PowerBI.

Qualification

(Educational background and professional experience requirements)

Bachelor's or Master's Degree in Computer Science or equivalent experience
Formal education in computer science or related disciplines, or equivalent hands-on industry experience.
At least 3+ years of experience in application design, development and analysis
Proven experience in designing, developing, and analyzing data-driven applications.
Experience in AWS Cloud Solutions. Retail Industry experience preferred.
Hands-on experience designing and implementing solutions on AWS, with exposure to retail domain use cases being an advantage.

Key Responsibilities

(What the role involves on a day-to-day basis)

Process data using spark (PySpark) : Develop and manage Spark-based data processing pipelines to handle large volumes of structured and unstructured data.
Collaborate and work with data analysts in various functions to ensure that data meets their reporting and analysis needs.
Work closely with business and analytics teams to deliver high-quality, reliable datasets.
Experience in creating Various ETL Frameworks in processing or extracting the data from cloud databases by using AWS Services by leveraging Lambda, Glue, PySpark, Step Functions, SNS, SQS and Batch.
Design end-to-end ETL frameworks using a wide range of AWS services to support scalable and automated data workflows.
Proven ability to be a strategic thinker to drive the necessary ownership and data governance within the organization
Demonstrate ownership of data pipelines and contribute to governance, best practices, and long-term architectural decisions.
Should have sound knowledge in various AWS Services
Strong understanding of AWS services and how they integrate to build robust data platforms.
Able to understand the existing Frameworks built in AWS PySpark and Glue
Quickly ramp up on existing solutions, analyze current implementations, and ensure continuity.
Able to scan ETL Frameworks and propose optimizations and cost savings.
Identify performance bottlenecks, improve efficiency, and recommend cost-optimization strategies

(ref:hirist.tech)