Design, build, and maintain on-premise data pipelines to ingest, process, and transform large volumes of data from multiple sources into data warehouses and data lakes
Develop and optimize PySpark and SQL jobs for high-performance batch and real-time data processing
Ensure the scalability, reliability, and performance of data infrastructure in an on-premise setup
Collaborate with data scientists, analysts, and business teams to translate their data requirements into technical solutions
Troubleshoot and resolve issues in data pipelines and data processing workflows
Monitor, tune, and improve Hadoop clusters and data jobs for cost and resource efficiency
Stay current with on-premise big data technology trends and suggest enhancements to improve data engineering capabilities
Bachelor s degree in Computer Science, Software Engineering, or a related field
5+ years of experience in data engineering or a related domain
Strong programming skills in Python (with experience in PySpark)