Project Role : Data Engineer
Project Role Description : Design, develop and maintain data solutions for data generation, collection, and processing. Create data pipelines, ensure data quality, and implement ETL (extract, transform and load) processes to migrate and deploy data across systems.
Must have skills : Apache Spark, AWS Glue
Good to have skills : NA
Minimum 3 Year(s) Of Experience Is Required
Educational Qualification : 15 years full time education
Summary:
As a Data Engineer, you will design, develop, and maintain data solutions that facilitate data generation, collection, and processing. Your typical day will involve creating data pipelines, ensuring data quality, and implementing ETL processes to migrate and deploy data across various systems. You will collaborate with cross-functional teams to understand data requirements and contribute to the overall data strategy of the organization, ensuring that data solutions are efficient, scalable, and aligned with business objectives.
As a Data Engineer, you will design, develop, and maintain data solutions that facilitate data generation, collection, and processing. Your typical day will involve creating data pipelines, ensuring data quality, and implementing ETL processes to migrate and deploy data across various systems. You will collaborate with cross-functional teams to understand data requirements and provide innovative solutions to enhance data accessibility and usability.
We are looking for skilled Data Engineer to join data migration project. In this developer-focused role, you will build and optimize ETL pipelines to migrate and transform data from legacy on-prem systems to AWS, leveraging native AWS transformation technologies
Key Responsibilities
Implement end-to-end ETL pipelines using AWS native services like Spark, Step Function, EventBridge, Glue (PySpark, SQL), and Lambda for data extraction, transformation, and loading.
Use pre-created utility & for seamless migration, handling high-volume datasets with error handling and retry mechanisms.
Collaborate with the Solution Designer to test pipelines for performance, and deploy with help of devOps engineer.
Monitor and troubleshoot pipelines using CloudWatch, optimize for cost and process scalability, and document code for team handover.
Support data quality validation and incremental loads to maintain data integrity during the hydration process.
Engage with multiple teams and contribute on key decisions. Provide solutions to problems for their immediate team and across multiple teams.
Professional & Technical Skills:
- Must To Have Skills: Proficiency in Apache Spark, AWS Glue, Pyspark, SQL, ETL, Unix, Iceberg, Astronomer, DW concepts
- Strong understanding of data pipeline architecture and design principles.
- Experience with data warehousing solutions and ETL processes.
- Familiarity with cloud computing services and data storage solutions.
- Ability to troubleshoot and optimize data workflows for performance.
Professional & Technical Skills:
- Must To Have Skills: Proficiency in Apache Spark, AWS Glue.
- Experience with data warehousing solutions and data lake architectures.
- Strong understanding of ETL processes and data integration techniques.
- Familiarity with cloud platforms and services, particularly AWS.
- Knowledge of data governance and data quality best practices.
Additional Information:
- The candidate should have minimum 3 years of experience in Apache Spark.
- This position is based at our Pune office.
- A 15 years full time education is required.
, 15 years full time education