Must-Have Qualifications:
- AWS Expertise:Strong hands-on experience with AWS data services includingGlue, Redshift, Athena, S3, Lake Formation, Kinesis, Lambda, Step Functions, EMR, andCloudWatch.
- ETL/ELT Engineering:Deep proficiency in designing robust ETL/ELT pipelines withAWS Glue (PySpark/Scala), Python, dbt, or other automation frameworks.
- Data Modeling:Advanced knowledge of dimensional (Star/Snowflake) and normalised data modeling, optimised forRedshift and S3-based lakehouses.
- Programming Skills:Proficient inPython, SQL, and PySpark, with automation and scripting skills for data workflows.
- Architecture Leadership:Demonstrated experience leading large-scale AWS data engineering projects across teams and domains.
- Pre-sales & Consulting:Proven experience working with clients, responding to technical RFPs, and designing cloud-native data solutions.
- Advanced PySpark Expertise:Deep hands-on experience in writing optimized PySpark code for distributed data processing, including transformation pipelines usingDataFrames,RDDs, andSpark SQL, with a strong grasp oflazy evaluation,catalyst optimizer, andTungsten execution engine.
- Performance Tuning & Partitioning:Proven ability to debug and optimize Spark jobs throughcustom partitioning strategies,broadcast joins,caching, andcheckpointing, with proficiency in tuningexecutor memory,shuffle configurations, and leveragingSpark UIfor performance diagnostics in large-scale data workloads (>TB scale).