Job Description – Data Engineer (AWS, PySpark, Advanced SQL)
Role: Data Engineer
Experience: 5–9 Years
Location: Open
Notice Period: Immediate to 30 Days Preferred
Job Summary
We are looking for a highly skilled Data Engineer with strong expertise in AWS cloud technologies, PySpark, Advanced SQL, AWS Glue, Redshift, and S3. The ideal candidate should have hands-on experience designing scalable ETL/ELT pipelines, optimizing big data workloads, and building cloud-based data platforms for analytics and reporting solutions.
Key Responsibilities
- Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
- Build and optimize ETL/ELT workflows using AWS Glue.
- Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
- Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
- Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
- Implement data quality checks, monitoring, logging, and error-handling mechanisms.
- Optimize Spark jobs for performance, scalability, and cost efficiency.
- Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
- Ensure data security, governance, and compliance standards are followed.
- Participate in code reviews, deployment processes, and production support activities.
Required Skills Technical Skills- Strong experience in AWS Cloud Services.
- Hands-on experience with:
- AWS Glue
- Amazon Redshift
- Amazon S3
- AWS IAM
- CloudWatch
- Lambda (Good to Have)
- Strong expertise in PySpark and Spark SQL.
- Advanced SQL knowledge including:
- Complex joins
- Window functions
- CTEs
- Query optimization
- Performance tuning
- Experience building large-scale ETL/ELT pipelines.
- Knowledge of data warehousing concepts and dimensional modeling.
- Experience handling large datasets and distributed data processing.
- Familiarity with Git, CI/CD pipelines, and Agile methodology.
Good to Have
- Experience with Airflow or other orchestration tools.
- Knowledge of Kafka/Kinesis streaming pipelines.
- Exposure to Snowflake or Databricks.
- Python scripting experience.
- Experience in healthcare, finance, or retail domains.
Educational Qualification
- Bachelor's/Master's degree in Computer Science, Information Technology, or related field.
Preferred Candidate Profile
- Strong analytical and problem-solving skills.
- Excellent communication and stakeholder management abilities.
- Ability to work independently in a fast-paced environment.
- Experience working in production support and optimization activities.
Interview Focus Areas
- PySpark transformations and optimization
- Advanced SQL query writing
- AWS Glue architecture and workflows
- Redshift performance tuning
- Data modeling concepts
- S3 partitioning and file formats (Parquet/ORC/CSV)
- Real-time project scenarios and
- Key Responsibilities
- Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
- Build and optimize ETL/ELT workflows using AWS Glue.
- Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
- Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
- Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
- Implement data quality checks, monitoring, logging, and error-handling mechanisms.
- Optimize Spark jobs for performance, scalability, and cost efficiency.
- Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
- Ensure data security, governance, and compliance standards are followed.
- Participate in code reviews, deployment processes, and production support activities.
Preferred Candidate Profile
- Strong analytical and problem-solving skills.
- Excellent communication and stakeholder management abilities.
- Ability to work independently in a fast-paced environment.
- Experience working in production support and optimization activities.
Interview Focus Areas
- PySpark transformations and optimization
- Advanced SQL query writing
- AWS Glue architecture and workflows
- Redshift performance tuning
- Data modeling concepts
- S3 partitioning and file formats (Parquet/ORC/CSV)
- Real-time project scenarios and troubleshooting
- Key Responsibilities
- Design, develop, and maintain scalable data pipelines using PySpark and AWS services.
- Build and optimize ETL/ELT workflows using AWS Glue.
- Develop efficient data ingestion frameworks from multiple structured and unstructured data sources.
- Create and optimize complex SQL queries, stored procedures, and transformations in Redshift.
- Work extensively with Amazon S3 for data storage, partitioning, and lifecycle management.
- Implement data quality checks, monitoring, logging, and error-handling mechanisms.
- Optimize Spark jobs for performance, scalability, and cost efficiency.
- Collaborate with Data Analysts, BI teams, and business stakeholders for data requirements.
- Ensure data security, governance, and compliance standards are followed.
- Participate in code reviews, deployment processes, and production support activities.
Preferred Candidate Profile
- Strong analytical and problem-solving skills.
- Excellent communication and stakeholder management abilities.
- Ability to work independently in a fast-paced environment.
- Experience working in production support and optimization activities.
Interview Focus Areas
- PySpark transformations and optimization
- Advanced SQL query writing
- AWS Glue architecture and workflows
- Redshift performance tuning
- Data modeling concepts
- S3 partitioning and file formats (Parquet/ORC/CSV)
- Real-time project scenarios and troubleshooting
Skills: redshift,aws glue,sql,pyspark