Job Title: Senior AWS Data Engineer
Location: Bangalore & Hyderabad (Hybrid)
Experience: 5-12 Years
The Role
We are looking for a Senior Data Engineer. Who will manage the high-availability AWS environments while building scalable ETL pipelines using AWS Glue (PySpark).
Responsibilities
- Infrastructure: Design and manage secure, scalable AWS environments (VPC, EC2, S3, RDS, IAM).
- Data Engineering: Develop AWS Glue ETL jobs (PySpark), manage Data Catalogs, and optimize S3 Data Lakes.
- Containers: Package and deploy workloads using Docker and EKS/ECS.
- Operations: Implement CloudWatch monitoring, troubleshoot Glue job failures, and optimize for cost/performance.
AWS Glue & Data Engineering
- Design, develop, and maintain AWS Glue ETL jobs using PySpark / Python.
- Build and manage AWS Glue Crawlers, Data Catalogs, and metadata structures.
- Implement batch and near-real-time ingestion pipelines into Amazon S3–based data lakes.
- Design efficient partitioning, schema management, and job performance tuning strategies.
- Handle error handling, retry logic, logging, and operational robustness of Glue jobs.
- Develop Glue Workflows and Triggers to orchestrate multi-step pipelines.
AWS-Native Data Platform Development
- Build data pipelines using Amazon S3 (raw, curated, consumption layers), AWS Glue, and Athena and/or Redshift for query and analytics access.
- Apply data modeling practices aligned to analytics use cases (curated / consumption-ready data).
- Support structured, semi-structured, and JSON-based data sources.
Data Governance & Quality
- Implement data quality checks and validation within Glue pipelines.
- Ensure schema evolution, backward compatibility, and controlled data changes.
- Apply IAM roles, permissions, and encryption standards for data access.
- Work with governance teams to support lineage, cataloging, and audit requirements.
Operational Support
- Monitor Glue jobs and data pipelines using CloudWatch logs and metrics.
- Perform troubleshooting, root cause analysis, and pipeline optimization.
- Ensure SLAs for critical data pipelines and analytics feeds.
Collaboration
- Partner closely with data product owners, BI teams, and downstream consumers.
- Translate business and analytics requirements into scalable technical solutions.
- Produce clear documentation for pipelines, datasets, and operational runbooks.
Required Skills & Qualifications
- 5+ years of experience in Data Engineering roles.
- Strong hands-on experience with AWS Glue (Jobs, Crawlers, Catalog, Workflows).
- Proficiency in Python and PySpark for ETL development.
- Solid experience with Amazon S3–based data lake architectures.
- Experience using Athena and/or Redshift for analytics consumption.
- Strong understanding of data partitioning, performance tuning, and cost optimization.
- Experience building production-grade, failure-tolerant data pipelines.