Data Engineer - Python/ETL

Aumni

Pune, India

3-5 Years

Save

Posted 2 months ago
Over 50 applicants

Job Description

Description

Job Summary :

We are looking for an experienced Data Engineer with 3 to 5 years of experience to join our team and contribute to building scalable, reliable, and high-performing data pipelines.

The ideal candidate should have strong expertise in Python, PySpark, SQL, AWS Cloud (EMR, Glue, Athena), Apache Airflow, and data warehousing concepts.

You will be responsible for designing, developing, and optimizing data pipelines that enable data-driven decision-making across the organization.

Key Responsibilities

Understand business requirements and translate them into scalable data engineering solutions.
Design, develop, and maintain ETL/ELT pipelines from various sources (databases, APIs, files, streaming).
Work extensively with AWS cloud services (S3, EMR, Glue, Athena, Lake Formation) to build and optimize data workflows.
Implement workflows/orchestration using Apache Airflow or equivalent tools.
Write efficient SQL queries for data extraction, transformation, and reporting.
Work with PySpark and distributed computing frameworks to process large-scale datasets.
Apply data warehousing concepts to design and manage data models supporting analytics and reporting.
Optimize Spark jobs for performance, cost efficiency, and scalability.
Ensure data quality, reliability, and governance through validation, monitoring, and automation.
Collaborate with Analysts, and Business teams to deliver trusted data solution.

Required Skills

Programming : Strong expertise in Python (with Pandas) and SQL.
Big Data Processing : Hands-on experience with PySpark and Spark optimization techniques.
Cloud Platforms : Proficiency in AWS (EMR, Glue, S3, Athena).
Workflow Orchestration : Experience with Apache Airflow for job scheduling and automation.
Data Warehousing : Solid understanding of data warehousing concepts, dimensional modeling, and ETL best practices.
Database Skills : Experience with relational databases (PostgreSQL).
Streaming & Messaging : Understanding of Kafka for real-time data streaming and integration.
Containerization : Knowledge of Docker for packaging and deploying data applications.
Best Practices : Familiarity with CI/CD, version control (Git), and modern data engineering standards.

(ref:hirist.tech)