Roles & Responsibilities:
- Define and drive the test automation strategy for data pipelines, ensuring alignment with enterprise data platform goals.
- Lead and mentor a team of data QA/test engineers, providing technical direction, career development, and performance feedback.
- Own delivery of automated data validation frameworks across real-time and batch data pipelines using Databricks and AWS services.
- Collaborate with data engineering, platform, and product teams to embed data quality checks and testability into pipeline design.
- Design and implement scalable validation frameworks for data ingestion, transformation, and consumption layers.
- Automate validations for multiple data formats including JSON, CSV, Parquet, and other structured/semi-structured file types during ingestion and transformation.
- Automate data testing workflows for pipelines built on Databricks/Spark, integrated with AWS services like S3, Glue, Athena, and Redshift.
- Establish reusable test components for schema validation, null checks, deduplication, threshold rules, and transformation logic.
- Integrate validation processes with CI/CD pipelines, enabling automated and event-driven testing across the development lifecycle.
- Drive the selection and adoption of tools/frameworks that improve automation, scalability, and test efficiency.
- Oversee testing of data visualizations in Tableau, Power BI, or custom dashboards, ensuring backend accuracy via UI and data-layer validations.
- Ensure accuracy of API-driven data services, managing functional and regression testing via Postman, Python, or other automation tools.
- Track test coverage, quality metrics, and defect trends, providing regular reporting to leadership and ensuring continuous improvement.
- establishing alerting and reporting mechanisms for test failures, data anomalies, and governance violations.
- Contribute to system architecture and design discussions, bringing a strong quality and testability lens early into the development lifecycle.
- Lead test automation initiatives by implementing best practices and scalable frameworks, embedding test suites into CI/CD pipelines to enable automated, continuous validation of data workflows, catalog changes, and visualization updates
- Mentor and guide QA engineers, fostering a collaborative, growth-oriented culture focused on continuous learning and technical excellence.
- Collaborate cross-functionally with product managers, developers, and DevOps to align quality efforts with business goals and release timelines.
- Conduct code reviews, test plan reviews, and pair-testing sessions to ensure team-level consistency and high-quality standards.
Must-Have Skills:
- Hands-on experience with Databricks and Apache Spark for building and validating scalable data pipelines
- Strong expertise in AWS services including S3, Glue, Athena, Redshift, and Lake Formation
- Proficient in Python, PySpark, and SQL for developing test automation and validation logic
- Experience validating data from various file formats such as JSON, CSV, Parquet, and Avro
- In-depth understanding ofdata integration workflowsincludingbatchandreal-time (streaming)pipelines
- Strong ability to define and automatedata quality checks: schema validation, null checks, duplicates, thresholds, and transformation validation
- Experience designingmodular, reusable automation frameworksfor large-scale data validation
- Skilled in integrating tests withCI/CD toolslikeGitHub Actions,Jenkins, orAzure DevOps
- Familiarity withorchestration toolssuch asApache Airflow,Databricks Jobs, orAWS Step Functions
- Hands-on experience withAPI testingusingPostman,pytest, or custom automation scripts
- Proven track record ofleading and mentoring QA/test engineering teams
- Ability to define and owntest automation strategy and roadmapfor data platforms
- Strongcollaboration skillsto work with engineering, product, and data teams
- Excellent communication skills for presentingtest results, quality metrics, andproject healthto leadership
- Contributions tointernal quality dashboardsordata observability systems
- Awareness ofmetadata-driven testing approachesandlineage-based validations
- Experience working with agile Testing methodologies such as Scaled Agile.
- Familiarity with automated testing frameworks like Selenium, JUnit, TestNG, or PyTest.
Good-to-Have Skills:
- Experience with data governance tools such as Apache Atlas, Collibra, or Alation
- Understanding of DataOps methodologies and practices
- Familiarity with monitoring/observability tools such as Datadog, Prometheus, or CloudWatch
- Experience building or maintainingtest data generators
Educationand Professional Certifications
- Bachelors/Masters degree in computer science and engineering preferred