Search by job, company or skills

zorba ai

Data Engineer Data-AWS ,Pyspark , Advanced SQL, AWS Glue , Redshift ,

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted an hour ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description – Data Engineer (AWS, PySpark, Advanced SQL)

Role: Data Engineer

Experience: 5–9 Years

Location: Open

Notice Period: Immediate to 30 Days Preferred

Job Summary

We are looking for a highly skilled Data Engineer with strong expertise in AWS cloud technologies, PySpark, Advanced SQL, AWS Glue, Redshift, and S3. The ideal candidate should have hands-on experience designing scalable ETL/ELT pipelines, optimizing big data workloads, and building cloud-based data platforms for analytics and reporting solutions.

Key Responsibilities

  • Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
  • Build and optimize ETL/ELT workflows using AWS Glue.
  • Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
  • Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
  • Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
  • Implement data quality checks, monitoring, logging, and error-handling mechanisms.
  • Optimize Spark jobs for performance, scalability, and cost efficiency.
  • Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
  • Ensure data security, governance, and compliance standards are followed.
  • Participate in code reviews, deployment processes, and production support activities.

Required Skills Technical Skills

  • Strong experience in AWS Cloud Services.
  • Hands-on experience with:
    • AWS Glue
    • Amazon Redshift
    • Amazon S3
    • AWS IAM
    • CloudWatch
    • Lambda (Good to Have)
  • Strong expertise in PySpark and Spark SQL.
  • Advanced SQL knowledge including:
    • Complex joins
    • Window functions
    • CTEs
    • Query optimization
    • Performance tuning
  • Experience building large-scale ETL/ELT pipelines.
  • Knowledge of data warehousing concepts and dimensional modeling.
  • Experience handling large datasets and distributed data processing.
  • Familiarity with Git, CI/CD pipelines, and Agile methodology.
Good to Have

  • Experience with Airflow or other orchestration tools.
  • Knowledge of Kafka/Kinesis streaming pipelines.
  • Exposure to Snowflake or Databricks.
  • Python scripting experience.
  • Experience in healthcare, finance, or retail domains.

Educational Qualification

  • Bachelor's/Master's degree in Computer Science, Information Technology, or related field.

Preferred Candidate Profile

  • Strong analytical and problem-solving skills.
  • Excellent communication and stakeholder management abilities.
  • Ability to work independently in a fast-paced environment.
  • Experience working in production support and optimization activities.

Interview Focus Areas

  • PySpark transformations and optimization
  • Advanced SQL query writing
  • AWS Glue architecture and workflows
  • Redshift performance tuning
  • Data modeling concepts
  • S3 partitioning and file formats (Parquet/ORC/CSV)
  • Real-time project scenarios and
  • Key Responsibilities
    • Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
    • Build and optimize ETL/ELT workflows using AWS Glue.
    • Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
    • Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
    • Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
    • Implement data quality checks, monitoring, logging, and error-handling mechanisms.
    • Optimize Spark jobs for performance, scalability, and cost efficiency.
    • Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
    • Ensure data security, governance, and compliance standards are followed.
    • Participate in code reviews, deployment processes, and production support activities.
Preferred Candidate Profile

  • Strong analytical and problem-solving skills.
  • Excellent communication and stakeholder management abilities.
  • Ability to work independently in a fast-paced environment.
  • Experience working in production support and optimization activities.

Interview Focus Areas

  • PySpark transformations and optimization
  • Advanced SQL query writing
  • AWS Glue architecture and workflows
  • Redshift performance tuning
  • Data modeling concepts
  • S3 partitioning and file formats (Parquet/ORC/CSV)
  • Real-time project scenarios and troubleshooting
  • Key Responsibilities
    • Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
    • Build and optimize ETL/ELT workflows using AWS Glue.
    • Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
    • Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
    • Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
    • Implement data quality checks, monitoring, logging, and error-handling mechanisms.
    • Optimize Spark jobs for performance, scalability, and cost efficiency.
    • Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
    • Ensure data security, governance, and compliance standards are followed.
    • Participate in code reviews, deployment processes, and production support activities.
Preferred Candidate Profile

  • Strong analytical and problem-solving skills.
  • Excellent communication and stakeholder management abilities.
  • Ability to work independently in a fast-paced environment.
  • Experience working in production support and optimization activities.

Interview Focus Areas

  • PySpark transformations and optimization
  • Advanced SQL query writing
  • AWS Glue architecture and workflows
  • Redshift performance tuning
  • Data modeling concepts
  • S3 partitioning and file formats (Parquet/ORC/CSV)
  • Real-time project scenarios and troubleshooting

Skills: redshift,aws glue,sql,pyspark

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147198635

Similar Jobs

Early Applicant