Key Responsibilities
- ETL Automation Frameworks: Design, develop, and maintain scalable ETL test automation frameworks to validate data pipelines across ingestion, transformation, and load layers.
- Data Validation Automation: Automate validation of source-to-target mappings, business rules, transformations, and aggregations across large datasets.
- Data Quality & Reconciliation: Perform automated data quality checks, reconciliation testing, and regression testing to ensure data accuracy, completeness, and consistency.
- Databricks & Spark Testing: Validate Databricks notebooks, Spark jobs, workflows, and Delta tables, including batch and incremental processing.
- SQL-Based Validation: Write and optimize complex SQL queries for data validation, anomaly detection, and reconciliation across data sources.
- Python / PySpark Automation: Develop automation scripts using Python and PySpark for validating large-scale data processing logic.
- End-to-End Testing: Design and execute end-to-end test scenarios, test cases, and acceptance criteria for data engineering deliverables, including ETL pipelines and data transformations.
- CI/CD Integration: Integrate ETL automation suites with CI/CD pipelines to support continuous testing and deployment.
- Documentation & Process: Create and maintain test plans, test strategies, execution reports, and clearly defined entry/exit criteria for releases.
- Collaboration & Mentorship: Work closely with Data Engineers, DevOps, and Product teams to resolve data issues; mentor junior QA engineers on automation best practices and debugging techniques.
________________________________________
Required Skills
- Experience: 10+ years of strong experience in ETL Automation Testing
- Databricks & Spark: Hands-on experience with Databricks and Apache Spark
- SQL: Advanced SQL skills for complex data validation
- Automation: Strong Python / PySpark automation experience
- Data Platforms: Experience testing Data Warehouses and Data Lakes
- Process: Experience working in Agile environments with CI/CD pipelines