Key Responsibilities
Technical Leadership & Ownership
- Own the end-to-end data engineering architecture for large-scale AWS data platforms
- Define and enforce data engineering standards, best practices, and governance frameworks
- Lead design reviews, code reviews, and technical decision-making across teams
- Act as the primary technical escalation point for complex data pipeline issues
ETL/ELT Design & Development
- Design, build, and optimize scalable ETL/ELT pipelines using:
- AWS Glue (Jobs, Workflows, Crawlers)
- PySpark / Spark SQL, Snowflake, SnowsQL
- Python-based data processing frameworks
- Implement incremental processing, CDC, and data partitioning strategies
- Develop reusable and modular data pipeline frameworks for enterprise use
Data Lake & Storage Management
- Design and manage data lake architecture on AWS (S3 + Apache Iceberg)
- Implement ACID-compliant data layers using Iceberg
- Optimize storage formats (Parquet, ORC) and data layouts for performance
- Define and enforce data lifecycle, retention, and archival policies
Performance Optimization & Cost Efficiency
- Tune Spark/Glue jobs for performance optimization (memory, partitioning, caching)
- Optimize workloads for cost efficiency in AWS (compute, storage, I/O)
- Monitor and improve pipeline SLAs, throughput, and latency metric
Data Governance & Quality
- Implement data quality frameworks, validations, and reconciliation checks
- Ensure compliance with data governance, lineage, and security standards
- Work with cataloging tools (AWS Glue Data Catalog, etc.) for metadata management
Integration & Orchestration
- Design and manage end-to-end orchestration workflows (Glue Workflows, Step Functions, Airflow if applicable)
- Integrate data across multiple sources (RDBMS, APIs, streaming platforms, files)
- Enable reliable, fault-tolerant, and restartable pipeline execution
Stakeholder Collaboration
- Partner with business, analytics, and AI teams to understand data requirements
- Collaborate with architects and DevOps teams for environment setup and automation
- Provide technical guidance to junior engineers and team members
Team Leadership & Mentoring
- Lead and mentor a team of data engineers
- Drive skill development in Spark, AWS, and modern data architectures
- Ensure adherence to Agile practices and timely delivery of milestones
Required Skills & Experience
Core Technical Skills
- Strong experience in AWS Data Engineering stack:
- AWS Glue, S3, Lambda, IAM, CloudWatch
- Advanced proficiency in:
- PySpark / Apache Spark
- Spark SQL
- Python
- Hands-on experience with Apache Iceberg / modern table formats
- Deep understanding of ETL/ELT design patterns and data pipelines
Data Engineering Expertise
- Experience with data lake and lakehouse architectures
- Strong knowledge of data modeling (star/snowflake schemas)
- Experience with batch and near real-time processing
- Familiarity with file formats (Parquet, ORC, Avro)
Performance & Optimization
- Proven experience in large-scale data processing (TB/PB scale)
- Strong expertise in query optimization, partitioning, and indexing strategies
DevOps & Automation
- Experience with CI/CD pipelines for data workflows
- Knowledge of infrastructure as code (CloudFormation/Terraform) is a plus
- Familiarity with version control (Git) and deployment strategies
Preferred Skills (Good to Have)
- Experience with data orchestration tools (Airflow, Step Functions)
- Exposure to streaming frameworks (Kafka, Kinesis)
- Knowledge of data security (encryption, masking, access control)
- Experience supporting AI/ML data pipelines
- Exposure to BI tools (Power BI, Tableau, Sigma)
Qualifications
- Bachelor's/Master's degree in Computer Science, Engineering, or related field
- 8–12+ years of experience in data engineering, with 3+ years in a technical leadership role