Data Engineer Data-AWS ,Pyspark , Advanced SQL, AWS Glue , Redshift ,

zorba ai

Bengaluru, India

5-9 Years

Save

Posted an hour ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Description – Data Engineer (AWS, PySpark, Advanced SQL)

Role: Data Engineer

Experience: 5–9 Years

Location: Open

Notice Period: Immediate to 30 Days Preferred

Job Summary

We are looking for a highly skilled Data Engineer with strong expertise in AWS cloud technologies, PySpark, Advanced SQL, AWS Glue, Redshift, and S3. The ideal candidate should have hands-on experience designing scalable ETL/ELT pipelines, optimizing big data workloads, and building cloud-based data platforms for analytics and reporting solutions.

Key Responsibilities

Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
Build and optimize ETL/ELT workflows using AWS Glue.
Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
Implement data quality checks, monitoring, logging, and error-handling mechanisms.
Optimize Spark jobs for performance, scalability, and cost efficiency.
Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
Ensure data security, governance, and compliance standards are followed.
Participate in code reviews, deployment processes, and production support activities.

Required Skills Technical Skills

Strong experience in AWS Cloud Services.
Hands-on experience with:

AWS Glue
Amazon Redshift
Amazon S3
AWS IAM
CloudWatch
Lambda (Good to Have)

Strong expertise in PySpark and Spark SQL.
Advanced SQL knowledge including:

Complex joins
Window functions
CTEs
Query optimization
Performance tuning

Experience building large-scale ETL/ELT pipelines.
Knowledge of data warehousing concepts and dimensional modeling.
Experience handling large datasets and distributed data processing.
Familiarity with Git, CI/CD pipelines, and Agile methodology.

Good to Have

Experience with Airflow or other orchestration tools.
Knowledge of Kafka/Kinesis streaming pipelines.
Exposure to Snowflake or Databricks.
Python scripting experience.
Experience in healthcare, finance, or retail domains.

Educational Qualification

Bachelor's/Master's degree in Computer Science, Information Technology, or related field.

Preferred Candidate Profile

Strong analytical and problem-solving skills.
Excellent communication and stakeholder management abilities.
Ability to work independently in a fast-paced environment.
Experience working in production support and optimization activities.

Interview Focus Areas

PySpark transformations and optimization
Advanced SQL query writing
AWS Glue architecture and workflows
Redshift performance tuning
Data modeling concepts
S3 partitioning and file formats (Parquet/ORC/CSV)
Real-time project scenarios and
Key Responsibilities

Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
Build and optimize ETL/ELT workflows using AWS Glue.
Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
Implement data quality checks, monitoring, logging, and error-handling mechanisms.
Optimize Spark jobs for performance, scalability, and cost efficiency.
Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
Ensure data security, governance, and compliance standards are followed.
Participate in code reviews, deployment processes, and production support activities.

Preferred Candidate Profile

Strong analytical and problem-solving skills.
Excellent communication and stakeholder management abilities.
Ability to work independently in a fast-paced environment.
Experience working in production support and optimization activities.

Interview Focus Areas

PySpark transformations and optimization
Advanced SQL query writing
AWS Glue architecture and workflows
Redshift performance tuning
Data modeling concepts
S3 partitioning and file formats (Parquet/ORC/CSV)
Real-time project scenarios and troubleshooting
Key Responsibilities

Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
Build and optimize ETL/ELT workflows using AWS Glue.
Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
Implement data quality checks, monitoring, logging, and error-handling mechanisms.
Optimize Spark jobs for performance, scalability, and cost efficiency.
Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
Ensure data security, governance, and compliance standards are followed.
Participate in code reviews, deployment processes, and production support activities.

Preferred Candidate Profile