Job Description
Strong proficiency in Python (data manipulation, scripting, automation) Advanced knowledge of SQL (joins, subqueries, window functions, performance tuning) Hands-on experience with ETL/ELT tools or frameworks Experience working with relational databases (e.g., MySQL, PostgreSQL, SQL Server, Oracle) Solid understanding of data warehousing concepts (fact/dimension tables, star schema) Ability to work with large datasets and optimize data processing workflows Strong analytical, problem-solving, and debugging skills
Develop, maintain, and optimize ETL pipelines to ingest, transform, and load data from multiple sources Write efficient, scalable, and well-documented Python scripts for data processing and automation Design and optimize complex SQL queries, views, and stored procedures for performance and accuracy Perform data validation, quality checks, and reconciliation to ensure data integrity Troubleshoot and resolve data pipeline failures and performance issues Collaborate with data analysts, data scientists, and business teams to understand data requirements Implement best practices for data security, governance, and compliance Maintain technical documentation for ETL workflows and data models
Experience with cloud platforms (AWS, Azure, or GCP) Familiarity with big data technologies (Spark, Hadoop) Exposure to workflow schedulers (Airflow, Prefect, Cron) Knowledge of CI/CD pipelines and version control (Git) Understanding of data governance and lineage tools