About the role
We are seeking Senior Data Engineers to join our data engineering team, who are passionate about data & analytics. The ideal fit for the role will have a strong background in handling large volumes of data with Apache Spark to build , enhance bespoke systems to harness data .The key focus of the position is to build or maintain systems to capture and store data on behalf of the business.
Key Responsibilities
- Design, develop, and maintain ETL processes and data pipelines using AWS Glue with Pyspark.
- Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver high-quality data solutions.
- Optimize and tune data pipelines for performance and scalability.
- Ensure data quality and integrity through robust testing and validation processes.
- Implement data governance and security best practices.
- Monitor and troubleshoot data pipelines to ensure continuous data flow and address any issues promptly.
- Stay up-to-date with the latest trends and technologies in data engineering.
Required Qualification
- B.E/B.Tech, preferably in Computer Science Engineering with relevant work experience.
- 7+ years of experience in handling data and designing ETL pipelines with mandatory 4+ years experience in writing Spark.
- Experience with AWS services such as Glue, Athena, S3, and Redshift is a plus.
- Good to have exposure to data modeling, data analytics, and design in both batch processing and real-time streaming.
- Solid understanding of data mapping, data processing patterns, distributed computing, and building applications for real-time and batch analytics
- Strong programming skills in design and implementation using Python, PySpark, SQL
- Good exposure in database architecture with SQL.
- Experience with multiple file formats like Avro, Parquet, ORC, and JSON.
- Developing, constructing, testing, and maintaining architectures for data lakes, data pipelines, data warehouses, and large-scale data processing systems on Databricks.